json_normalize嵌套資料庫-有解無憂

我正在嘗試將 json 資料庫扁平化為 Pandas 資料幀，因為這是我第一次處理 json 格式，所以我不能做我想做的事。資料庫位于https://mtgjson.com/downloads/all-files/#allprices，根據模型，結構如下：

{
  "0120a941-9cfb-50b5-b5e4-4e0c7bd32410": {
    "mtgo": {
      "cardhoarder": {
        "currency": "USD",
        "retail": {
          "foil": {
            ..., // more rows
            "2020-04-21": 0.02
          },
          "normal": {
            ..., // more rows
            "2020-04-21": 0.02
          }
        }
      },
    },
    "paper": {
      "cardkingdom" : {
        "buylist": {
          "foil": {
            ..., // more rows
            "2020-04-21": 0.6
          },
          "normal": {
            ..., // more rows
            "2020-04-21": 0.01
          }
        },
        "currency": "USD",
        "retail": {
          "foil": {
            ..., // more rows
            "2020-04-21": 0.12
          },
          "normal": {
            ..., // more rows
            "2020-04-21": 0.02
          }
        }
      },
      "cardmarket": {
        "currency": "EUR",
        "retail": {
          "foil": {
            ..., // more rows
            "2020-04-21": 0.12
          },
          "normal": {
            ..., // more rows
            "2020-04-21": 0.02
          }
        }
      },
      "tcgplayer": {
        "currency": "USD",
        "retail": {
          "foil": {
            ..., // more rows
            "2020-04-21": 0.12
          },
          "normal": {
            ..., // more rows
            "2020-04-21": 0.02
          }
        }
      }
    }
  }
}

當我查看 json 檔案時，我有這個：

{"meta": {"date": "2021-11-07", "version": "5.1.0 20211107"}, "data": {"00010d56-fe38-5e35-8aed-518019aa36a5": {"paper": {"cardkingdom": {"buylist": {"foil.....

當我做基本的時，pd.read_json('AllPrices.json')我得到了這個

	元	資料
日期	2021-11-07	NaN
版本	5.1.0 20211107	NaN
00010d56-fe38-5e35-8aed-518019aa36a5	NaN	{'paper': {'cardkingdom': {'buylist': {'foil':...
0001e0d0-2dcd-5640-aadc-a84765cf5fc9	NaN	{'paper': {'cardkingdom': {'buylist': {'normal ...

所以我做了一些研究，發現json_normalize并寫了這段代碼：

with open('AllPrices.json','r') as f:
    data = json.loads(f.read())
pd.json_normalize(data, errors='ignore')

This did the job by flattening the json database but I ended with one row and 31 millions columns. What I want is only one information in this database that is the uuid and the cardmarket price of a normal paper card on the date I want like this :

uuid	paper.cardmarket.retail.normal.2021-11-07
00010d56-fe38-5e35-8aed-518019aa36a5	0.5
0001e0d0-2dcd-5640-aadc-a84765cf5fc9	0.25

I played with the record_path = parameter and the meta = parameter but the best I did was not my expected table. I tried record_path = ['data'] that give me only the uuid in one column.

Thanks for your help

uj5u.com熱心網友回復：

在您的情況下，json_normalize對每條uuid記錄使用然后提取所需的資訊“paper.cardmarket.retail.normal”：

with open('AllPrices.json') as fp:
    prices = json.load(fp)

    data = []
    for uuid in prices['data']:
        df = pd.json_normalize(prices['data'][uuid]) \
               .filter(like='paper.cardmarket.retail.normal')
        if df.empty:
            continue
        df.columns = df.columns.str.rsplit('.', 1).str[-1]
        df.index = [uuid]
        data.append(df)
    df = pd.concat(data)

輸出：（在AllPrices.json檔案上測驗）

                                      2021-08-09  2021-08-11  2021-08-12  2021-08-13  ...  2021-10-27  2021-10-28  2021-11-02  2021-11-09
00010d56-fe38-5e35-8aed-518019aa36a5        4.35        4.35        4.35        4.35  ...        4.35        4.35        4.35        4.35
0001e0d0-2dcd-5640-aadc-a84765cf5fc9        4.95        4.95        3.45        3.45  ...        7.99        6.81        5.42        6.77
0003caab-9ff5-5d1a-bc06-976dd0457f19        0.24        0.27        0.07        0.38  ...        0.25        0.04        0.13        0.36
0003d249-25d9-5223-af1e-1130f09622a7        0.30        0.30        0.75        0.75  ...        0.15        0.20        0.04        0.25
0004a4fb-92c6-59b2-bdbe-ceb584a9e401        0.27        0.14        0.19        0.10  ...        0.10        0.05        0.19        0.13
...                                          ...         ...         ...         ...  ...         ...         ...         ...         ...
fffa4ccf-733e-513a-98f9-181b9549de62        0.23        0.15        0.21        0.21  ...        0.10        0.21        0.15        0.20
fffb659e-b3fa-5cd8-9423-fe5ac74248b5        0.49        0.49        0.49        0.49  ...        0.35        0.35        0.35        0.20
fffbc95a-c4d1-56aa-8653-8a7c71fe19ce        6.95        6.95        6.95        6.95  ...       10.26       10.26       10.26        6.43
fffc1305-a118-559b-9504-3d7b56ca0bde        0.18        0.18        0.18        0.18  ...        0.04        0.04        0.04        0.04
fffdd333-3789-5104-a8be-37be199a2cb1        0.87        0.73        0.99        0.99  ...        0.49        0.45        0.45        0.15

[50568 rows x 75 columns]

uj5u.com熱心網友回復：

您可以定義一個函式來獲取字典中的值并將此函式應用于df['data']：

df = pd.read_json('AllPrices.json')

def extract_from_json(cell):
    if isinstance(cell, dict) and "cardmarket" in cell.keys():
        return list(cell["cardmarket"]["retail"]["normal"].values())[0]
    return cell

df["data"]=df["data"].apply(extract_from_json)
print(df)`

如果您只想要特定日期的價格，您的函式應回傳以下內容：

return cell["cardmarket"]["retail"]["normal"].get("2021-11-07", None)

uj5u.com熱心網友回復：

該json_normalize功能是相當強大的，但它不能是靈丹妙藥可以解決任何事情。在這里，json 模塊已經將您的檔案轉換為普通的 Python 字典。所以你只需要迭代那個字典來構建一個串列，并使用該串列來提供一個 DataFrame：

with open('AllPrices.json','r') as f:
    data = json.load(f)

df = pd.DataFrame([[k, v['paper']['cardmarket']['retail']['normal']['2020-04-21']]
                   for k,v in data.items()],
                  columns = ['uuid', 'paper.cardmarket.retail.normal.2021-11-07'])

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/355108.html

標籤：python json pandas

上一篇：在Python中計算每日基礎資料的每月百分比變化

下一篇：bootstrap-vue-基于布林值顯示按鈕