Download history stock prices automatically from yahoo finance in python
Asked Answered
R

6

50

Is there a way to automatically download historical prices of stocks from yahoo finance or google finance (csv format)? Preferably in Python.

Readymix answered 14/9, 2012 at 23:6 Comment(1)
check out scrape-google-finance.compunect.com it's a rather new open source PHP scraper for Google finance. It's free to use/modify and you can download all stock prices and all companies from Google. Should not be too difficult to learn from it and write the same in python.Limnology
M
42

Short answer: Yes. Use Python's urllib to pull the historical data pages for the stocks you want. Go with Yahoo! Finance; Google is both less reliable, has less data coverage, and is more restrictive in how you can use it once you have it. Also, I believe Google specifically prohibits you from scraping the data in their ToS.

Longer answer: This is the script I use to pull all the historical data on a particular company. It pulls the historical data page for a particular ticker symbol, then saves it to a csv file named by that symbol. You'll have to provide your own list of ticker symbols that you want to pull.

import urllib

base_url = "http://ichart.finance.yahoo.com/table.csv?s="
def make_url(ticker_symbol):
    return base_url + ticker_symbol

output_path = "C:/path/to/output/directory"
def make_filename(ticker_symbol, directory="S&P"):
    return output_path + "/" + directory + "/" + ticker_symbol + ".csv"

def pull_historical_data(ticker_symbol, directory="S&P"):
    try:
        urllib.urlretrieve(make_url(ticker_symbol), make_filename(ticker_symbol, directory))
    except urllib.ContentTooShortError as e:
        outfile = open(make_filename(ticker_symbol, directory), "w")
        outfile.write(e.content)
        outfile.close()
Muskeg answered 14/9, 2012 at 23:29 Comment(2)
I get the following error: AttributeError: module 'urllib' has no attribute 'ContentTooShortError'Helban
This answer was written 7 years ago, for Python 2.7. In the intervening timespan, Python 3 has modified the urllib module to be broken into submodules. The exception class you're looking for can be found here: docs.python.org/3/library/…Muskeg
T
108

When you're going to work with such time series in Python, pandas is indispensable. And here's the good news: it comes with a historical data downloader for Yahoo: pandas.io.data.DataReader.

from pandas.io.data import DataReader
from datetime import datetime

ibm = DataReader('IBM',  'yahoo', datetime(2000, 1, 1), datetime(2012, 1, 1))
print(ibm['Adj Close'])

Here's an example from the pandas documentation.

Update for pandas >= 0.19:

The pandas.io.data module has been removed from pandas>=0.19 onwards. Instead, you should use the separate pandas-datareader package. Install with:

pip install pandas-datareader

And then you can do this in Python:

import pandas_datareader as pdr
from datetime import datetime

ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(ibm['Adj Close'])

Downloading from Google Finance is also supported.

There's more in the documentation of pandas-datareader.

Tart answered 20/9, 2012 at 10:5 Comment(2)
When I try it, the imports work fine but when I call the 'goog' line I receive an error: "IOError: after 3 tries, Yahoo! did not return a 200 for url 'ichart.finance.yahoo.com/…'" How could this be fixed?Repartition
@Repartition Seems to be because GOOG is not accepted by the API (don't understand why, after move to Alphabet GOOG ticker was kept). Works fine for GOOGL and various other symbols. Example adjusted just in case.Tart
M
42

Short answer: Yes. Use Python's urllib to pull the historical data pages for the stocks you want. Go with Yahoo! Finance; Google is both less reliable, has less data coverage, and is more restrictive in how you can use it once you have it. Also, I believe Google specifically prohibits you from scraping the data in their ToS.

Longer answer: This is the script I use to pull all the historical data on a particular company. It pulls the historical data page for a particular ticker symbol, then saves it to a csv file named by that symbol. You'll have to provide your own list of ticker symbols that you want to pull.

import urllib

base_url = "http://ichart.finance.yahoo.com/table.csv?s="
def make_url(ticker_symbol):
    return base_url + ticker_symbol

output_path = "C:/path/to/output/directory"
def make_filename(ticker_symbol, directory="S&P"):
    return output_path + "/" + directory + "/" + ticker_symbol + ".csv"

def pull_historical_data(ticker_symbol, directory="S&P"):
    try:
        urllib.urlretrieve(make_url(ticker_symbol), make_filename(ticker_symbol, directory))
    except urllib.ContentTooShortError as e:
        outfile = open(make_filename(ticker_symbol, directory), "w")
        outfile.write(e.content)
        outfile.close()
Muskeg answered 14/9, 2012 at 23:29 Comment(2)
I get the following error: AttributeError: module 'urllib' has no attribute 'ContentTooShortError'Helban
This answer was written 7 years ago, for Python 2.7. In the intervening timespan, Python 3 has modified the urllib module to be broken into submodules. The exception class you're looking for can be found here: docs.python.org/3/library/…Muskeg
M
14

Extending @Def_Os's answer with an actual demo...

As @Def_Os has already said - using Pandas Datareader makes this task a real fun

In [12]: from pandas_datareader import data

pulling all available historical data for AAPL starting from 1980-01-01

#In [13]: aapl = data.DataReader('AAPL', 'yahoo', '1980-01-01')

# yahoo api is inconsistent for getting historical data, please use google instead.
In [13]: aapl = data.DataReader('AAPL', 'google', '1980-01-01')

first 5 rows

In [14]: aapl.head()
Out[14]:
                 Open       High     Low   Close     Volume  Adj Close
Date
1980-12-12  28.750000  28.875000  28.750  28.750  117258400   0.431358
1980-12-15  27.375001  27.375001  27.250  27.250   43971200   0.408852
1980-12-16  25.375000  25.375000  25.250  25.250   26432000   0.378845
1980-12-17  25.875000  25.999999  25.875  25.875   21610400   0.388222
1980-12-18  26.625000  26.750000  26.625  26.625   18362400   0.399475

last 5 rows

In [15]: aapl.tail()
Out[15]:
                 Open       High        Low      Close    Volume  Adj Close
Date
2016-06-07  99.250000  99.870003  98.959999  99.029999  22366400  99.029999
2016-06-08  99.019997  99.559998  98.680000  98.940002  20812700  98.940002
2016-06-09  98.500000  99.989998  98.459999  99.650002  26419600  99.650002
2016-06-10  98.529999  99.349998  98.480003  98.830002  31462100  98.830002
2016-06-13  98.690002  99.120003  97.099998  97.339996  37612900  97.339996

save all data as CSV file

In [16]: aapl.to_csv('d:/temp/aapl_data.csv')

d:/temp/aapl_data.csv - 5 first rows

Date,Open,High,Low,Close,Volume,Adj Close
1980-12-12,28.75,28.875,28.75,28.75,117258400,0.431358
1980-12-15,27.375001,27.375001,27.25,27.25,43971200,0.408852
1980-12-16,25.375,25.375,25.25,25.25,26432000,0.378845
1980-12-17,25.875,25.999999,25.875,25.875,21610400,0.38822199999999996
1980-12-18,26.625,26.75,26.625,26.625,18362400,0.399475
...
Montelongo answered 14/6, 2016 at 8:30 Comment(0)
S
8

There is already a library in Python called yahoo_finance so you'll need to download the library first using the following command line:

sudo pip install yahoo_finance

Then once you've installed the yahoo_finance library, here's a sample code that will download the data you need from Yahoo Finance:

#!/usr/bin/python
import yahoo_finance
import pandas as pd

symbol = yahoo_finance.Share("GOOG")
google_data = symbol.get_historical("1999-01-01", "2016-06-30")
google_df = pd.DataFrame(google_data)

# Output data into CSV
google_df.to_csv("/home/username/google_stock_data.csv")

This should do it. Let me know if it works.

UPDATE: The yahoo_finance library is no longer supported.

Sequela answered 26/7, 2016 at 15:53 Comment(5)
Just curious -- what benefit does yahoo-finance provide over pandas_datareader (or vice versa)?Kiehl
At the time that was the only library I could find in relation to pulling Yahoo Finance Stock Prices. However, it doesn't seem to be working anymore.Sequela
Affirmed - 21/02/2018 - yahoo_finance no longer works.Duluth
Thanks for this 3kstcSequela
Yahoo finance API is no longer supported.Odyssey
A
4

You can check out the yahoo_fin package. It was initially created after Yahoo Finance changed their API (documentation is here: http://theautomatic.net/yahoo_fin-documentation).

from yahoo_fin import stock_info as si

aapl_data = si.get_data("aapl")

nflx_data = si.get_data("nflx")

aapl_data.head()

nflx_data.head()

aapl_data.to_csv("aapl_data.csv")

nflx_data.to_csv("nflx_data.csv")
Acerbic answered 13/4, 2019 at 2:26 Comment(2)
Thank you for this reference, this was the only ready solution that work for me. All other packages mentioned here seem to be outdated and don't work, probably because of the API changes in Yahoo Finance.\Pharmacology
typo: aapl.to_csv("aapl_data.csv")` should be aapl_data.to_csv("aapl_data.csv")Cavefish
I
2

It's trivial when you know how:

import yfinance as yf
df = yf.download('CVS', '2015-01-01')
df.to_csv('cvs-health-corp.csv')

If you wish to plot it:

import finplot as fplt
fplt.candlestick_ochl(df[['Open','Close','High','Low']])
fplt.show()

enter image description here

Italianism answered 15/10, 2020 at 14:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.