Weather data scraping and extraction in R [closed]
Asked Answered
H

1

1

I'm working on a research project and am assigned to do a bit of data scraping and writing code in R that can help extract current temperature for a particular zip code from a site such as wunderground.com. Now this may be a bit of an abstract question but does anyone know how to do the following: I can extract the current temperature of a particular zip code by doing this:

    temps <- readLines("http://www.wunderground.com/q/zmw:20904.1.99999")
    edit(temps)
    temps //gives me the source code for the website where I can look at the line that contains the temperature
    ldata <- temps[lnumber]
    ldata
    #  then have a few gsub functions that basically extracts 
    # just the numerical data (57.8 for example) from that line of code

I have a cvs file that contains zip code of every city in the country and I have that imported in R. It is arranged in a table according to zip, city and state. My challenge now is to write a method (using java analogy here because I'm new to R) that basically extracts 6-7 consecutive zip codes (after a particular one specified) and runs the above code by modifying the link within the readLines function and putting in the respective zip code after the link segment zmw:XXXXX and running everything after that based on that link. Now I don't quite know how to extract the data from the table. Maybe with a for-loop function? But then I don't know how to use that to modify the link. I think that's where I'm really getting stuck on. I have a bit of Java background so I understand HOW to approach this problem, just not the knowledge of the syntax. I understand this is quite an abstract question as I didn't provide a lot of code but I just want to know they functions/syntax that will help me extract the data from the table and somehow use that to modify the link through a function rather than manually doing it.

Homotaxis answered 4/6, 2015 at 17:34 Comment(3)
Note that your comment characters (/**/, //) aren't valid in R, which uses # only.Pneumatology
@AlexA. Yeah. My bad. I was in Java mode!Homotaxis
The scope of this question can be narrowed down. THe word scraping should disappear from the title and it should just be "retrieve weather data from weather underground". In the body you can say you are willing to retrieve historical data or possibly scrape it. You can mention that you want to start with 10 zipcode locations. The code can stay, but it needs to be all valid R. You can get rid of the Java background statements are they are not really pertinent.Tree
T
2

So this is about the Weather Underground data.

You can download csv files from individual weather stations in wunderground, however you need to know the weather station identifier. Here is an example URL for a weather station in Kirkland, WA (KWAKIRKL8):

http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&month=1&year=2014&graphspan=day&format=1

Here is some R code:

  url <- 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&month=1&year=2014&graphspan=day&format=1'
  s <- getURL(url)
  s <- gsub("<br>\n","",s)  
  wdf <- read.csv(con<-textConnection(s))

And here is a page with which you can manually find stations and their codes.

http://www.wunderground.com/wundermap/

Since you only need a few you can pick them out manually.

Tree answered 4/6, 2015 at 17:51 Comment(10)
I downloaded a CVS file from unitedstateszipcodes.org and put it in a folder in my working directory and did the following to arrange it in a table with only zipcodes being listed: ZipData<-read.csv(file.path(wd,"DataImport","zip_code_data.csv"), colClasses=c("character","NULL","factor","NULL","NULL","factor", "NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL", "NULL","NULL"), col.names=c("zip","","city","","","state","","","","","","","","", "","")) edit(ZipData) as.numeric(ZipData$zip) I thought it made it easier to run a loop on and just extract individual zip codes.Homotaxis
A list of zipcodes is not much of a help. We need a list of weatherstations. How many zipcodes do you need data for?Tree
10 sequential ones. What I have so far (elementary though it might be) is this: Input <- "20904" urlBefore <- "http://www.wunderground.com/q/zmw:" urlAfter <- ".1.99999" link<- paste(urlBefore,Input,urlAfter, sep="") link temps <- readLines(link) temps edit(temps) This gives me the source code that I can run all my substring stuff on to retrieve the current temperature. It works well I think. (Apologies for the messy code. I don't know how to put spaces inline to space the code out)Homotaxis
So you can select 10 weather stations by hand using the link I provided. Then my solution is complete. Would be cool if you accepted it (after trying it out of course :)Tree
I am trying to avoid as much manual input as possible. I'm simply testing it on 10. I would like to be able to do this for as many sequential zipcodes as I possibly can given an initial user input of sorts. Any further help is appreciated.Homotaxis
I ran it and it works well. I will accept it as it has helped me progress but I'm wondering if there is a more automated way to approach this. ThanksHomotaxis
There might be a way to automatically find a station (or a list of stations) by zipcode, but I haven't seen such a list on the site anywhere. But it is a big site... might be there somewhere.Tree
@Sammy: You can get weather stations near zipcodes from here: wunderground.com/weather/api/d/docs?d=data/geolookup. With these, you can iterate over each station,e.g., using lapplyUnbeatable
@User227710 That is good and useful and should be documented here. You should write that up as a (part of the) solution. I will give you an upvote.Tree
@Mike: You can update your answer with my comment.Unbeatable

© 2022 - 2025 — McMap. All rights reserved.