I am working on a data mining project and I would like to gather historical weather data. I am able to get historical data through the web interface that they provide at http://www.ncdc.noaa.gov/cdo-web/search. But I would like to access this data programmatically through an API. From what I have been reading on StackOverflow this data is supposed to be public domain, but the only place I have been able to find it is on non-free services like Wunderground. How can I access this data for free?
For a list of all service APIs provided by the National Climatic Data Center: http://www.ncdc.noaa.gov/cdo-web/webservices
Full documentation to the API which backs the search page you listed: http://www.ncdc.noaa.gov/cdo-web/webservices/v2
Requires a token, and limits to 1000 requests per day. If you need the limit increased for legitimate reasons contact http://www.ncdc.noaa.gov/customer-support.
Also, for bulk downloading use ftp: ftp://ftp.ncdc.noaa.gov/pub/data/
curl -H "Authorization: <token>" http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets
where <token>
is the token that was emailed to me, but it is returning the error {"status" : "400", "message" : "Token parameter is required."}
–
Mendacious curl()
like this-> curl_setopt($init, CURLOPT_URL, 'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&startdate='.$startDate.'&enddate='.$endDate.'&datatypeid=TMAX&datatypeid=TMIN&stationid=GHCND:'.$city_id.'&limit='.$limit);//'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&stationid=GHCND:ZI000067964&limit=31'); curl_setopt($init, CURLOPT_HEADER, false); curl_setopt($init, CURLOPT_HTTPHEADER, array('token:<token here>')); curl_setopt($init, CURLOPT_RETURNTRANSFER, 1);
–
Santossantosdumont curl -H "token: <token>" http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets
–
Incredible As far as I know, all NOAA historical weather data is available for free through the upgini python library: https://upgini.com
However, you will not be able to download this data if you do not have the task of training the ML algorithm. A feature of upgini is the enrichment of dataframes with only relevant columns with data. Relevance in this case is understood as the significance of a data column (for example, temperature) for the predicting of some target event.
If you have such task try to run data enrichment with upgini to get NOAA historical weather data for free:
%pip install upgini
from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (search_keys={'rep_date': SearchKey.DATE, 'country': SearchKey.COUNTRY, 'postal_code': SearchKey.POSTAL_CODE})
enricher.fit(X_train, Y_train)
Dependencies
- pip install selenium
- download chrome driver('chromedriver.exe') #For Windows OS https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_win32.zip
Once the drivers and libraries are downloaded, we need to find out the codes for required locations by clicking on the map. (Source website: https://www.weather.gov/wrh/climate)
#Keys for required states
# RECAP NAME CLICK ON MAP SELECT UNDER 1. LOCATION
# Dallas Fort Worth (fwd) Dallas Area
# Florida Miami (mfl) Miami Area
# New York New York (okx) NY-Central Park Area
# Minneapolis Minneapolis (mpx) Minneapolis Area
# California Los Angeles(lox) LA Downtown Area
state_code_dict = {'Dallas':['fwd',3],'Florida':['mfl',1],
'New York':['okx',24],'Minneapolis':['mpx',1],
'California':['lox',2]}
The numbers in the state_code_dict are the location of required area in the given dropdown. for ex: for Florida, the code is 'mfl', in Florida Miami area is present in the 1st in the dropdown list.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('chromedriver.exe')
df_ = pd.DataFrame() #(columns = ['Date','Average','Recap_name'])
for i in state_code_dict.keys():
#Load the driver with webpage
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 30)
print("Running for: ",i)
## Below url redirects to the data page
## source site is (https://www.weather.gov/wrh/climate)
url = "https://nowdata.rcc-acis.org/" + state_code_dict[i][0] + "/"
select_location = "/html/body/div[1]/div[3]/select/option[" + str(state_code_dict[i][1]) + "]"
select_date = "tDatepicker"
## Give desired date/month in 'yyyy-mm' format, as it pulls the complete month data at once.
set_date = "'2023-07'"
date_freeze = "arguments[0].value = "+ set_date
#X_PATH of go button to click for next window to open. X_PATH can be found from inspect element in chrome.
click_go = "//*[@id='go']"
wait_table_span = "//*[@id='results_area']/table[1]/caption/span"
enlarge_click = "/html/body/div[5]/div[1]/button[1]"
#Get the temprature table from the appearing html using below X_Path
get_table = '//*[@id="results_area"]'
try:
driver.get(url)
# wait 10 seconds before looking for element
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,select_location)))
element.click()
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID,select_date)))
driver.execute_script(date_freeze, element)
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,click_go)))
element.click()
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,wait_table_span)))
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,enlarge_click)))
element.click()
data = driver.find_element(By.XPATH,get_table).get_attribute("innerHTML")
df = pd.read_html(data)
df[0].columns = df[0].columns.droplevel(0)
df_all = df[0][['Date','Average']]
df_all['Recap_name'] = i
finally:
driver.quit()
df_ = df_.append(df_all)
## Write different states data to different sheets in excel
with pd.ExcelWriter("avg_temp.xlsx") as writer:
for i in state_code_dict.keys():
df_write = df_[df_.Recap_name == i]
df_write.to_excel(writer, sheet_name=i, index=False)
print("--------Finished----------")
© 2022 - 2024 — McMap. All rights reserved.