How to scrape Yahoo Finance stock data with Python

This time, we are going to learn the hands-on ability to scrape Yahoo financial data.

  1. Set-up python environment.
  2. Yahoo Finance page. For example, Alphabet Inc. (GOOG).
  3. Scrape & parse the page.

The page we will be scrape.

In the “network" page, we can’t find the json data.

You can get all the data from the page source in script.

View page source – script – root.App.main
from bs4 import BeautifulSoup
import re
import json
import requests

response = requests.get("https://finance.yahoo.com/quote/GOOG?p=GOOG&.tsrc=fin-srch")
soup = BeautifulSoup(response.text, "html.parser")
# print(soup.prettify())
script = soup.find('script', text=re.compile('root\.App\.main')).text
data = json.loads(re.search("root.App.main\s+=\s+(\{.*\})", script).group(1))
stores = data["context"]["dispatcher"]["stores"]
print(stores)

Response data

{'PageStore': {'currentPageName': 'quote', 'currentEvent': {'eventName': 'NEW_PAGE_SUCCESS'}
....
....
'currency': 'USD', 'trailingPE': {'raw': 31.640472, 'fmt': '31.64'}, 
'regularMarketVolume': {'raw': 791234, 'fmt': '791.23k', 'longFmt': '791,234'}
....
...
'MobileHeaderStore': {'navTitle': 'finance', 'useNavTitle': False}}

You get all the data you need.

Next, you need to find the required fields one by one.

financial_data = stores["QuoteSummaryStore"]["financialData"]
pprint.pprint(financial_data)

Output data

{'currentPrice': {'fmt': '2,913.75', 'raw': 2913.75},
 'currentRatio': {'fmt': '3.15', 'raw': 3.152},
 'debtToEquity': {'fmt': '11.83', 'raw': 11.829},
 'earningsGrowth': {'fmt': '169.10%', 'raw': 1.691},
 'ebitda': {'fmt': '75.55B', 'longFmt': '75,552,997,376', 'raw': 75552997376},
 'ebitdaMargins': {'fmt': '34.30%', 'raw': 0.34300998},
 'financialCurrency': 'USD',
 'freeCashflow': {'fmt': '44.61B',
                  'longFmt': '44,609,626,112',
                  'raw': 44609626112},
 'grossMargins': {'fmt': '55.72%', 'raw': 0.55723},
 'grossProfits': {'fmt': '97.8B',
                  'longFmt': '97,795,000,000',
                  'raw': 97795000000},
 'maxAge': 86400,
 'numberOfAnalystOpinions': {'fmt': '9', 'longFmt': '9', 'raw': 9},
 'operatingCashflow': {'fmt': '80.86B',
                       'longFmt': '80,858,996,736',
                       'raw': 80858996736},
 'operatingMargins': {'fmt': '28.45%', 'raw': 0.28448},
 'profitMargins': {'fmt': '28.57%', 'raw': 0.2857},
 'quickRatio': {'fmt': '3.03', 'raw': 3.027},
 'recommendationKey': 'buy',
 'recommendationMean': {'fmt': '1.60', 'raw': 1.6},
 'returnOnAssets': {'fmt': '12.76%', 'raw': 0.12759},
 'returnOnEquity': {'fmt': '28.29%', 'raw': 0.2829},
 'revenueGrowth': {'fmt': '61.60%', 'raw': 0.616},
 'revenuePerShare': {'fmt': '326.66', 'raw': 326.656},
 'targetHighPrice': {'fmt': '3,400.00', 'raw': 3400},
 'targetLowPrice': {'fmt': '2,700.00', 'raw': 2700},
 'targetMeanPrice': {'fmt': '3,103.33', 'raw': 3103.33},
 'targetMedianPrice': {'fmt': '3,100.00', 'raw': 3100},
 'totalCash': {'fmt': '135.86B',
               'longFmt': '135,863,001,088',
               'raw': 135863001088},
 'totalCashPerShare': {'fmt': '203.77', 'raw': 203.768},
 'totalDebt': {'fmt': '28.1B', 'longFmt': '28,100,999,168', 'raw': 28100999168},
 'totalRevenue': {'fmt': '220.27B',
                  'longFmt': '220,265,005,056',
                  'raw': 220265005056}}

We get current price and current ratio information.

If you want to get more financial Information, you can find it from other Json key.

ex: “StreamDataStore"

More content can be modified from the code above.

What’s next?

  • Complete data
  • Export data
  • Prevent getting blacklisted while scraping
  • Host the scraper

發表留言