我正在嘗試將 json 資料庫扁平化為 Pandas 資料幀,因為這是我第一次處理 json 格式,所以我不能做我想做的事。資料庫位于https://mtgjson.com/downloads/all-files/#allprices,根據模型,結構如下:
{
"0120a941-9cfb-50b5-b5e4-4e0c7bd32410": {
"mtgo": {
"cardhoarder": {
"currency": "USD",
"retail": {
"foil": {
..., // more rows
"2020-04-21": 0.02
},
"normal": {
..., // more rows
"2020-04-21": 0.02
}
}
},
},
"paper": {
"cardkingdom" : {
"buylist": {
"foil": {
..., // more rows
"2020-04-21": 0.6
},
"normal": {
..., // more rows
"2020-04-21": 0.01
}
},
"currency": "USD",
"retail": {
"foil": {
..., // more rows
"2020-04-21": 0.12
},
"normal": {
..., // more rows
"2020-04-21": 0.02
}
}
},
"cardmarket": {
"currency": "EUR",
"retail": {
"foil": {
..., // more rows
"2020-04-21": 0.12
},
"normal": {
..., // more rows
"2020-04-21": 0.02
}
}
},
"tcgplayer": {
"currency": "USD",
"retail": {
"foil": {
..., // more rows
"2020-04-21": 0.12
},
"normal": {
..., // more rows
"2020-04-21": 0.02
}
}
}
}
}
}
當我查看 json 檔案時,我有這個:
{"meta": {"date": "2021-11-07", "version": "5.1.0 20211107"}, "data": {"00010d56-fe38-5e35-8aed-518019aa36a5": {"paper": {"cardkingdom": {"buylist": {"foil.....
當我做基本的時,pd.read_json('AllPrices.json')我得到了這個
| 元 | 資料 | |
|---|---|---|
| 日期 | 2021-11-07 | NaN |
| 版本 | 5.1.0 20211107 | NaN |
| 00010d56-fe38-5e35-8aed-518019aa36a5 | NaN | {'paper': {'cardkingdom': {'buylist': {'foil':... |
| 0001e0d0-2dcd-5640-aadc-a84765cf5fc9 | NaN | {'paper': {'cardkingdom': {'buylist': {'normal ... |
所以我做了一些研究,發現json_normalize并寫了這段代碼:
with open('AllPrices.json','r') as f:
data = json.loads(f.read())
pd.json_normalize(data, errors='ignore')
This did the job by flattening the json database but I ended with one row and 31 millions columns. What I want is only one information in this database that is the uuid and the cardmarket price of a normal paper card on the date I want like this :
| uuid | paper.cardmarket.retail.normal.2021-11-07 |
|---|---|
| 00010d56-fe38-5e35-8aed-518019aa36a5 | 0.5 |
| 0001e0d0-2dcd-5640-aadc-a84765cf5fc9 | 0.25 |
I played with the record_path = parameter and the meta = parameter but the best I did was not my expected table.
I tried record_path = ['data'] that give me only the uuid in one column.
Thanks for your help
uj5u.com熱心網友回復:
在您的情況下,json_normalize對每條uuid記錄使用然后提取所需的資訊“paper.cardmarket.retail.normal”:
with open('AllPrices.json') as fp:
prices = json.load(fp)
data = []
for uuid in prices['data']:
df = pd.json_normalize(prices['data'][uuid]) \
.filter(like='paper.cardmarket.retail.normal')
if df.empty:
continue
df.columns = df.columns.str.rsplit('.', 1).str[-1]
df.index = [uuid]
data.append(df)
df = pd.concat(data)
輸出:(在AllPrices.json檔案上測驗)
2021-08-09 2021-08-11 2021-08-12 2021-08-13 ... 2021-10-27 2021-10-28 2021-11-02 2021-11-09
00010d56-fe38-5e35-8aed-518019aa36a5 4.35 4.35 4.35 4.35 ... 4.35 4.35 4.35 4.35
0001e0d0-2dcd-5640-aadc-a84765cf5fc9 4.95 4.95 3.45 3.45 ... 7.99 6.81 5.42 6.77
0003caab-9ff5-5d1a-bc06-976dd0457f19 0.24 0.27 0.07 0.38 ... 0.25 0.04 0.13 0.36
0003d249-25d9-5223-af1e-1130f09622a7 0.30 0.30 0.75 0.75 ... 0.15 0.20 0.04 0.25
0004a4fb-92c6-59b2-bdbe-ceb584a9e401 0.27 0.14 0.19 0.10 ... 0.10 0.05 0.19 0.13
... ... ... ... ... ... ... ... ... ...
fffa4ccf-733e-513a-98f9-181b9549de62 0.23 0.15 0.21 0.21 ... 0.10 0.21 0.15 0.20
fffb659e-b3fa-5cd8-9423-fe5ac74248b5 0.49 0.49 0.49 0.49 ... 0.35 0.35 0.35 0.20
fffbc95a-c4d1-56aa-8653-8a7c71fe19ce 6.95 6.95 6.95 6.95 ... 10.26 10.26 10.26 6.43
fffc1305-a118-559b-9504-3d7b56ca0bde 0.18 0.18 0.18 0.18 ... 0.04 0.04 0.04 0.04
fffdd333-3789-5104-a8be-37be199a2cb1 0.87 0.73 0.99 0.99 ... 0.49 0.45 0.45 0.15
[50568 rows x 75 columns]
uj5u.com熱心網友回復:
您可以定義一個函式來獲取字典中的值并將此函式應用于df['data']:
df = pd.read_json('AllPrices.json')
def extract_from_json(cell):
if isinstance(cell, dict) and "cardmarket" in cell.keys():
return list(cell["cardmarket"]["retail"]["normal"].values())[0]
return cell
df["data"]=df["data"].apply(extract_from_json)
print(df)`
如果您只想要特定日期的價格,您的函式應回傳以下內容:
return cell["cardmarket"]["retail"]["normal"].get("2021-11-07", None)
uj5u.com熱心網友回復:
該json_normalize功能是相當強大的,但它不能是靈丹妙藥可以解決任何事情。在這里,json 模塊已經將您的檔案轉換為普通的 Python 字典。所以你只需要迭代那個字典來構建一個串列,并使用該串列來提供一個 DataFrame:
with open('AllPrices.json','r') as f:
data = json.load(f)
df = pd.DataFrame([[k, v['paper']['cardmarket']['retail']['normal']['2020-04-21']]
for k,v in data.items()],
columns = ['uuid', 'paper.cardmarket.retail.normal.2021-11-07'])
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/355108.html
