Using BeautifulSoup to Search Through Yahoo Finance
Asked Answered
G

1

5

I'm trying to pull information from the 'Key Statistics' page for a ticker in Yahoo (since this isn't supported in the Pandas library).

Example for AAPL:

from bs4 import BeautifulSoup
import requests

url = 'http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

enterpriseValue = soup.findAll('$ENTERPRISE_VALUE', attrs={'class': 'yfnc_tablehead1'}) #HTML tag for where enterprise value is located

print(enterpriseValue)

Edit: thanks Andy!

Question: This is printing an empty array. How do I change my findAll to return 598.56B?

Glynda answered 30/8, 2016 at 4:23 Comment(0)
C
12

Well, the reason the list that find_all returns is empty is because that data is generated with a separate call that isn't completed by just sending a GET request to that URL. If you look through the Network tab on Chrome/Firefox and filter by XHR, by examining the requests and responses of each network action, you can find what you URL you ought to be sending the GET request too.

In this case, it's https://query2.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com, as we can see here: enter image description here

So, how do we recreate this? Simple! :

from bs4 import BeautifulSoup
import requests

r = requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com')
data = r.json()

This will return the JSON response as a dict. From there, navigate through the dict until you find the data you're after:

financial_data = data['quoteSummary']['result'][0]['defaultKeyStatistics']
enterprise_value_dict = financial_data['enterpriseValue']
print(enterprise_value_dict)
>>> {'fmt': '598.56B', 'raw': 598563094528, 'longFmt': '598,563,094,528'}
print(enterprise_value_dict['fmt'])
>>> '598.56B'
Colcothar answered 30/8, 2016 at 4:46 Comment(2)
This is gold! I'm new to webpage scrapping in general. Are there any resources you could point me towards to avoid similar questions in the near future?Glynda
Check out automatetheboringstuff.com/chapter11 and if you really want to do a deep dive, consider shop.oreilly.com/product/0636920034391.do . It's a great skill to have.Colcothar

© 2022 - 2024 — McMap. All rights reserved.