How to download a nested JSON into a pandas dataframe
Asked Answered
L

4

10

Looking to sharpen my data science skills. I am practicing url data pulls from a sports site and the json file has multiple nested dictionaries. I would like to be able to pull this data to map my own custom form of the leaderboard in matplotlib, etc., but am having a hard time getting the json to a workable df.

The main website is: https://www.usopen.com/scoring.html

Looking at the background I believe the live info is being pulled from the link listed in the short code below. I'm working in Jupyter notebooks. I can get the data successfully pulled.

But as you can see, it is pulling multiple nested dictionaries which is making it very difficult in getting a simple dataframe pulled.

Was just looking to get player, score to par, total, and round pulled. Any help would be greatly appreciated, thank you!

import pandas as pd
import urllib as ul
import json
url = "https://gripapi-static-pd.usopen.com/gripapi/leaderboard.json"
response = ul.request.urlopen(url)
data = json.loads(response.read())
print(data)
Lampkin answered 19/9, 2020 at 19:19 Comment(0)
F
2

Simple and Quick Solution. A better solution might exist with JSON normalize from pandas but this is fairly good for your use case.

def func(x):
    if not any(x.isnull()):
        return (x['round'], x['player']['firstName'], x['player']['identifier'], x['toParToday']['value'], x['totalScore']['value'])

df = pd.DataFrame(data['standings'])
df['round'] = data['currentRound']['name']
df = df[['player', 'toPar', 'toParToday', 'totalScore', 'round']]
info = df.apply(func, axis=1)
info_df = pd.DataFrame(list(info.values), columns=['Round', 'player_name', 'pid', 'to_par_today', 'totalScore'])
info_df.head()
Foreordination answered 19/9, 2020 at 20:39 Comment(0)
R
11
  • Use requests.get(url).json() to get the data
  • Use pandas.json_normalize to unpack the standings key into a dataframe
  • roundScores is a list of dicts
    • The list must be expanded with .explode
    • The the column of dicts must be normalized again
  • join the normalized column back to dataframe df
import requests
import pandas as pd

# load the data
df = pd.json_normalize(requests.get(url).json(), 'standings')

# explode the roundScores column
df = df.explode('roundScores', ignore_index=True)

# normalize the dicts in roundScores and join back to df
df = df.join(pd.json_normalize(df.roundScores), rsuffix='_rs').drop(columns=['roundScores']).reset_index(drop=True)

# display(df.head())
   isRecapAvailable player.identifier player.firstName player.lastName player.image.gravity player.image.type                     player.image.identifier player.image.cropMode player.country.name player.country.code player.country.flag.type player.country.flag.identifier  player.isAmateur  toPar.value toPar.format toPar.displayValue  toParToday.value toParToday.format toParToday.displayValue  totalScore.value totalScore.format totalScore.displayValue  position.value position.format position.displayValue  holesThrough.value holesThrough.format holesThrough.displayValue liveVideo.identifier liveVideo.isLive  score.value score.format score.displayValue  toPar.value_rs toPar.format_rs toPar.displayValue_rs
0              True             56278          Matthew           Wolff               center   imageCloudinary  us-open/players/2020-players/Matthew_Wolff                  fill       United States                 usa          imageCloudinary              us-open/flags/usa             False           -5     absolute                 -5                -5          absolute                      -5             140.0          absolute                     140               1        absolute                     1                  10            absolute                        10                  NaN              NaN           66     absolute                 66              -4        absolute                    -4
1              True             56278          Matthew           Wolff               center   imageCloudinary  us-open/players/2020-players/Matthew_Wolff                  fill       United States                 usa          imageCloudinary              us-open/flags/usa             False           -5     absolute                 -5                -5          absolute                      -5             140.0          absolute                     140               1        absolute                     1                  10            absolute                        10                  NaN              NaN           74     absolute                 74               4        absolute                    +4
2              True             56278          Matthew           Wolff               center   imageCloudinary  us-open/players/2020-players/Matthew_Wolff                  fill       United States                 usa          imageCloudinary              us-open/flags/usa             False           -5     absolute                 -5                -5          absolute                      -5             140.0          absolute                     140               1        absolute                     1                  10            absolute                        10                  NaN              NaN            0     absolute                                 -5        absolute                    -5
3              True             34360          Patrick            Reed               center   imageCloudinary   us-open/players/2019-players/Patrick-Reed                  fill       United States                 usa          imageCloudinary              us-open/flags/usa             False           -4     absolute                 -4                 0          absolute                       E             136.0          absolute                     136               2        absolute                     2                   7            absolute                         7                  NaN              NaN           66     absolute                 66              -4        absolute                    -4
4              True             34360          Patrick            Reed               center   imageCloudinary   us-open/players/2019-players/Patrick-Reed                  fill       United States                 usa          imageCloudinary              us-open/flags/usa             False           -4     absolute                 -4                 0          absolute                       E             136.0          absolute                     136               2        absolute                     2                   7            absolute                         7                  NaN              NaN           70     absolute                 70               0        absolute                     E

Additional Keys

  • standings is just one of the keys from the downloaded JSON
r = requests.get(url).json()

print(r)
[out]:
dict_keys(['currentRound', 'standings', 'fullLegend', 'shortLegend', 'inlineLegend', 'cutLine', 'meta'])

Resources

Rave answered 19/9, 2020 at 20:10 Comment(0)
D
2

You might want to try this:

import requests
import pandas as pd


url = "https://gripapi-static-pd.usopen.com/gripapi/leaderboard.json"
data = pd.DataFrame.from_dict(requests.get(url).json()['standings'])

print(data['totalScore'])

Output:

0      {'value': 140, 'format': 'absolute', 'displayV...
1      {'value': 136, 'format': 'absolute', 'displayV...
2      {'value': 140, 'format': 'absolute', 'displayV...
3      {'value': 138, 'format': 'absolute', 'displayV...
4      {'value': 138, 'format': 'absolute', 'displayV...
                             ...                        
Derick answered 19/9, 2020 at 19:44 Comment(1)
Thank you for your time and help!Lampkin
O
2

You'll really need to write some custom code to get what you want out of the json. Here's some inspiration if you wanted to get some of the player details into a df however.

df = pd.DataFrame([x['player'] for x in data['standings']])
df['image'] = df['image'].apply(lambda x: x['identifier'])
df['country'] = df['country'].apply(lambda x: x['name'])
Obfuscate answered 19/9, 2020 at 19:46 Comment(1)
Thanks for your time and help, I appreciate it!Lampkin
F
2

Simple and Quick Solution. A better solution might exist with JSON normalize from pandas but this is fairly good for your use case.

def func(x):
    if not any(x.isnull()):
        return (x['round'], x['player']['firstName'], x['player']['identifier'], x['toParToday']['value'], x['totalScore']['value'])

df = pd.DataFrame(data['standings'])
df['round'] = data['currentRound']['name']
df = df[['player', 'toPar', 'toParToday', 'totalScore', 'round']]
info = df.apply(func, axis=1)
info_df = pd.DataFrame(list(info.values), columns=['Round', 'player_name', 'pid', 'to_par_today', 'totalScore'])
info_df.head()
Foreordination answered 19/9, 2020 at 20:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.