使用Pandas決議嵌套的json-有解無憂

我想嘗試使用熊貓來決議此嵌套的JSON，當我想從列“量”和??“專案”中提取資料時，我感到困惑，并且資料有很多行，例如數百，這是示例之一

{
    "_id": "62eaa99b014c9bb30203e48a",
    "amount": {
      "product": 291000,
      "shipping": 75000,
      "admin_fee": 4500,
      "order_voucher_deduction": 0,
      "transaction_voucher_deduction": 0,
      "total": 366000,
      "paid": 366000
    },
    "status": 32,
    "items": [
      {
        "_id": "62eaa99b014c9bb30203e48d",
        "earning": 80400,
        "variants": [
          {
            "name": "Color",
            "value": "Black"
          },
          {
            "name": "Size",
            "value": "38"
          }
        ],
        "marketplace_price": 65100,
        "product_price": 62000,
        "reseller_price": 145500,
        "product_id": 227991,
        "name": "Heels",
        "sku_id": 890512,
        "internal_markup": 3100,
        "weight": 500,
        "image": "https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg",
        "quantity": 1,
        "supplier_price": 60140
      }

我已經嘗試使用此索引，只顯示索引

dfjson=pd.json_normalize(datasetjson)
dfjson.head(3)

使用 Pandas 決議嵌套的 json

＃＃更新使用 Pandas 決議嵌套的 json

我嘗試添加pd.Dataframe，是的，它可以成為資料框，但我仍然不知道如何提取_id，收入，變體

uj5u.com熱心網友回復：

嘗試pd.json_normalize(datasetjson, max_level=0)

uj5u.com熱心網友回復：

我想您會混淆使用字典或JSON格式。

這條線與您擁有的樣本相同，但最后錯過]}了。我格式化洗掉空格，但它是相同的：

dfjson = {"_id":"62eaa99b014c9bb30203e48a","amount":{"product":291000,"shipping":75000,"admin_fee":4500,"order_voucher_deduction":0,"transaction_voucher_deduction":0,"total":366000,"paid":366000},"status":32,"items":[{"_id":"62eaa99b014c9bb30203e48d","earning":80400,"variants":[{"name":"Color","value":"Black"},{"name":"Size","value":"38"}],"marketplace_price":65100,"product_price":62000,"reseller_price":145500,"product_id":227991,"name":"Heels","sku_id":890512,"internal_markup":3100,"weight":500,"image":"https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg","quantity":1,"supplier_price":60140}]}

Now, if you want to call amount:

dfjson['amount']
# Output
{'product': 291000,
 'shipping': 75000,
 'admin_fee': 4500,
 'order_voucher_deduction': 0,
 'transaction_voucher_deduction': 0,
 'total': 366000,
 'paid': 366000}

如果要呼叫專案：

dfjson['items']
# Output
[{'_id': '62eaa99b014c9bb30203e48d',
  'earning': 80400,
  'variants': [{'name': 'Color', 'value': 'Black'},
   {'name': 'Size', 'value': '38'}],
  'marketplace_price': 65100,
  'product_price': 62000,
  'reseller_price': 145500,
  'product_id': 227991,
  'name': 'Heels',
  'sku_id': 890512,
  'internal_markup': 3100,
  'weight': 500,
  'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
  'quantity': 1,
  'supplier_price': 60140}]

要獲取專案，您可以創建一個串列：

list_items = []
for i in dfjson['items']:
    list_items.append(i)

uj5u.com熱心網友回復：

如果要使用 Dataframe，資料必須是 2d 格式。您的資料是串列，字典..按資料組織，您需要先將其剪切，然后將其轉換為Dataframe。

uj5u.com熱心網友回復：

鑒于：

data = {
 '_id': '62eaa99b014c9bb30203e48a',
 'amount': {'admin_fee': 4500,
            'order_voucher_deduction': 0,
            'paid': 366000,
            'product': 291000,
            'shipping': 75000,
            'total': 366000,
            'transaction_voucher_deduction': 0},
 'items': [{'_id': '62eaa99b014c9bb30203e48d',
            'earning': 80400,
            'image': 'https://product-asset.s3.ap-southeast-1.amazonaws.com/1659384575578.jpeg',
            'internal_markup': 3100,
            'marketplace_price': 65100,
            'name': 'Heels',
            'product_id': 227991,
            'product_price': 62000,
            'quantity': 1,
            'reseller_price': 145500,
            'sku_id': 890512,
            'supplier_price': 60140,
            'variants': [{'name': 'Color', 'value': 'Black'},
                         {'name': 'Size', 'value': '38'}],
            'weight': 500}],
 'status': 32
}

正在做：

df = pd.json_normalize(data, ['items'], ['amount'])
df = df.join(df.amount.apply(pd.Series))
df = df.join(df.variants.apply(pd.DataFrame)[0].set_index('name').T.reset_index(drop=True))
df = df.drop(['amount', 'variants'], axis=1)
print(df)

輸出：

                        _id  earning  marketplace_price  product_price  reseller_price  product_id   name  sku_id  internal_markup  weight                                              image  quantity  supplier_price  product  shipping  admin_fee  order_voucher_deduction  transaction_voucher_deduction   total    paid  Color Size
0  62eaa99b014c9bb30203e48d    80400              65100          62000          145500      227991  Heels  890512             3100     500  https://product-asset.s3.ap-southeast-1.amazon...         1           60140   291000     75000       4500                        0                              0  366000  366000  Black   38

可能有更好的方法來做這件事，但提供的樣本甚至不是有效的 json 物件，所以我不能確定真實資料的實際樣子。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/504389.html

標籤：Python 熊猫数据框

上一篇：使用Matplotlib減少顏色欄中的刻度數

下一篇：具有字典的資料框，我希望資料框的方式是字典的鍵成為行，值成為列