背景
我正在嘗試從以下API訪問資料。這是一個帶有嵌套字典的巨大嵌套 json。我正在努力提高可讀性。json 檔案的格式與此鏈接一致(右側,單擊全部展開)
我試過的
我搜索了 SO 和其他網站,pd.json_normalize 似乎是答案,但我嘗試了幾種方法,它只解包一層。
# Attempt 1
url = 'https://api.opennem.org.au/station/'
response = requests.get(url).json()
df2 = pd.json_normalize(response, max_level=0)
print(df2)
# Attempt 2
url = 'https://api.opennem.org.au/station/'
response = requests.get(url).json()
df = pd.json_normalize(response, record_path=['facilities'])
print(df)
當前錯誤輸出
version ... data
0 3.11.3 ... [{'id': 488, 'code': 'ADP', 'name': 'Adelaide ...
[1 rows x 5 columns]
請求幫助
任何人都知道如何將這個大型嵌套 json 解壓縮到資料框中?
uj5u.com熱心網友回復:
您可以將json_normalize嵌套串列用于規范化data和facilities:
df2 = pd.json_normalize(response, ['data',['facilities']])
print(df2.head(3))
id station_id code dispatch_type active capacity_registered \
0 689 488 ADPBA1L LOAD True 6.27
1 690 488 ADPBA1G GENERATOR True 6.27
2 516 372 ALBANY_WF1 GENERATOR True 21.60
network_region unit_number unit_capacity approved network.code \
0 SA1 1.0 6.27 True NEM
1 SA1 1.0 6.27 True NEM
2 WEM NaN NaN True WEM
network.country network.label \
0 au NEM
1 au NEM
2 au WEM
network.regions network.timezone \
0 [{'code': 'NSW1'}, {'code': 'QLD1'}, {'code': ... Australia/Sydney
1 [{'code': 'NSW1'}, {'code': 'QLD1'}, {'code': ... Australia/Sydney
2 [{'code': 'WEM'}] Australia/Perth
network.timezone_database network.offset network.interval_size \
0 AEST 600 5
1 AEST 600 5
2 AWST 480 30
network.interval_shift network.has_interconnectors \
0 5 False
1 5 False
2 0 False
network.intervals_per_hour fueltech.code fueltech.label \
0 12.0 battery_charging Battery (Charging)
1 12.0 battery_discharging Battery (Discharging)
2 2.0 wind Wind
fueltech.renewable status.code status.label registered \
0 True committed Committed NaN
1 True committed Committed NaN
2 True operating Operating 2018-10-12T00:00:00
approved_at emissions_factor_co2 approved_by
0 NaN NaN NaN
1 NaN NaN NaN
2 2020-12-09T15:34:49.465445 00:00 NaN NaN
獎金:
如果需要也標量network.regions:
df2 = pd.json_normalize(response, ['data',['facilities']])
df2['network.regions'] = [[y['code'] for y in x] for x in df2['network.regions']]
df2 = df2.explode('network.regions').reset_index(drop=True)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/516778.html
上一篇:如何洗掉特定列具有特殊文本的行?
下一篇:查找每個特定視窗中的最大行數
