Historical weather data from NOAA
Asked Answered
M

3

10

I am working on a data mining project and I would like to gather historical weather data. I am able to get historical data through the web interface that they provide at http://www.ncdc.noaa.gov/cdo-web/search. But I would like to access this data programmatically through an API. From what I have been reading on StackOverflow this data is supposed to be public domain, but the only place I have been able to find it is on non-free services like Wunderground. How can I access this data for free?

Mendacious answered 14/11, 2013 at 13:19 Comment(3)
possible duplicate of How to use the NOAA API to query past weather data for a given set of coordinatesHumour
Great question. Without an api, I've simply fell back on (respectful) scraping strategies. The NOAA data is a great resource, but requires some QA/QC. Check out this resource related to this articleVivisection
Another alternative is to use the ftp page for the GHCN-DVivisection
I
10

For a list of all service APIs provided by the National Climatic Data Center: http://www.ncdc.noaa.gov/cdo-web/webservices

Full documentation to the API which backs the search page you listed: http://www.ncdc.noaa.gov/cdo-web/webservices/v2

Requires a token, and limits to 1000 requests per day. If you need the limit increased for legitimate reasons contact http://www.ncdc.noaa.gov/customer-support.

Also, for bulk downloading use ftp: ftp://ftp.ncdc.noaa.gov/pub/data/

Incredible answered 18/11, 2013 at 12:19 Comment(4)
I'm having trouble with the token, here is my curl request: curl -H "Authorization: <token>" http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets where <token> is the token that was emailed to me, but it is returning the error {"status" : "400", "message" : "Token parameter is required."}Mendacious
i only found a way with curl() like this-> curl_setopt($init, CURLOPT_URL, 'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&startdate='.$startDate.'&enddate='.$endDate.'&datatypeid=TMAX&datatypeid=TMIN&stationid=GHCND:'.$city_id.'&limit='.$limit);//'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&stationid=GHCND:ZI000067964&limit=31'); curl_setopt($init, CURLOPT_HEADER, false); curl_setopt($init, CURLOPT_HTTPHEADER, array('token:<token here>')); curl_setopt($init, CURLOPT_RETURNTRANSFER, 1);Santossantosdumont
azrosen92: curl -H "token: <token>" http://www.ncdc.noaa.gov/cdo-web/api/v2/datasetsIncredible
The api has updated, documentation is available at: ncei.noaa.gov/support/… (and yes, it is an update despite having a lower version number)Undying
C
0

As far as I know, all NOAA historical weather data is available for free through the upgini python library: https://upgini.com

However, you will not be able to download this data if you do not have the task of training the ML algorithm. A feature of upgini is the enrichment of dataframes with only relevant columns with data. Relevance in this case is understood as the significance of a data column (for example, temperature) for the predicting of some target event.

If you have such task try to run data enrichment with upgini to get NOAA historical weather data for free:

%pip install upgini

from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (search_keys={'rep_date': SearchKey.DATE, 'country': SearchKey.COUNTRY, 'postal_code': SearchKey.POSTAL_CODE})
enricher.fit(X_train, Y_train)
Croteau answered 17/6, 2022 at 16:37 Comment(0)
S
0

Dependencies

  1. pip install selenium
  2. download chrome driver('chromedriver.exe') #For Windows OS https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_win32.zip

Once the drivers and libraries are downloaded, we need to find out the codes for required locations by clicking on the map. (Source website: https://www.weather.gov/wrh/climate)

#Keys for required states

# RECAP NAME                   CLICK ON MAP                SELECT UNDER 1. LOCATION
# Dallas                       Fort Worth (fwd)               Dallas Area
# Florida                      Miami  (mfl)                   Miami Area
# New York                     New York  (okx)                NY-Central Park Area
# Minneapolis                  Minneapolis (mpx)              Minneapolis Area
# California                   Los Angeles(lox)               LA Downtown Area

state_code_dict = {'Dallas':['fwd',3],'Florida':['mfl',1],
                   'New York':['okx',24],'Minneapolis':['mpx',1],
                   'California':['lox',2]}

The numbers in the state_code_dict are the location of required area in the given dropdown. for ex: for Florida, the code is 'mfl', in Florida Miami area is present in the 1st in the dropdown list.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('chromedriver.exe')

df_ = pd.DataFrame() #(columns = ['Date','Average','Recap_name'])
for i in state_code_dict.keys():
    
    #Load the driver with webpage
    driver = webdriver.Chrome(options=options, service=webdriver_service)
    wait = WebDriverWait(driver, 30)
    print("Running for: ",i)
    ## Below url redirects to the data page
    ## source site is (https://www.weather.gov/wrh/climate)
    url = "https://nowdata.rcc-acis.org/" + state_code_dict[i][0] + "/"
    select_location = "/html/body/div[1]/div[3]/select/option[" + str(state_code_dict[i][1]) + "]"
    select_date = "tDatepicker"
    
    ## Give desired date/month in 'yyyy-mm' format, as it pulls the complete month data at once.
    set_date = "'2023-07'"
    date_freeze = "arguments[0].value = "+ set_date
    
    #X_PATH of go button to click for next window to open. X_PATH can be found from inspect element in chrome.
    click_go = "//*[@id='go']"
    wait_table_span = "//*[@id='results_area']/table[1]/caption/span"
    enlarge_click = "/html/body/div[5]/div[1]/button[1]"
    
    #Get the temprature table from the appearing html using below X_Path 
    get_table = '//*[@id="results_area"]'
    try:
        driver.get(url)
        # wait 10 seconds before looking for element
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,select_location)))
        element.click()
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID,select_date)))
        driver.execute_script(date_freeze, element)
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,click_go)))
        element.click()
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,wait_table_span)))
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,enlarge_click)))
        element.click()
        data = driver.find_element(By.XPATH,get_table).get_attribute("innerHTML")
        df = pd.read_html(data)
        df[0].columns = df[0].columns.droplevel(0)
        df_all = df[0][['Date','Average']] 
        df_all['Recap_name'] = i
    finally:
        driver.quit()
    df_ = df_.append(df_all)
    
## Write different states data to different sheets in excel    
with pd.ExcelWriter("avg_temp.xlsx") as writer:
    for i in state_code_dict.keys():
        df_write = df_[df_.Recap_name == i]
        df_write.to_excel(writer, sheet_name=i, index=False)
    print("--------Finished----------")
Steelman answered 31/7, 2023 at 7:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.