如何使用Pandas構建JSON檔案-有解無憂

我正在嘗試獲取一個 CSV 檔案并使用這些值構造一個 JSON 檔案。JSON 檔案需要采用非常特定的格式才能匯入 Azure。

我對 Python 很陌生，事實上這是我第一次正確使用 Python。

我已經開始使用 Pandas 將 csv 轉換為資料框，然后在轉換為 Json 之前進行少量格式化。這是一個好的開始，但它的格式并不完全正確。請看下文。

import pandas as pd

df=pd.read_csv("C:\\Users***Required_data.csv")

filtered = df['Work Item Type'].str.contains('Task')

dftest = df[filtered]
dftest = dftest.rename(columns={"Work Item    
Type":"System.WorkItemType","Title":"System.Title","AssignedTo":"System.AssignedTo","State":"System.State","Tags":"System.Tags","Description":"System.Description"})
dftest["System.AreaPath"] = "**********"

dftest.to_json(r"C:\\Users****\\Required_datatest.json",indent=4,orient="records")`

這在 Json 中為我提供了以下格式 - 物件陣列

源資料：

如何使用 Pandas 構建 JSON 檔案

我的嘗試結果：

[
    {
        "ID":15898,
        "System.WorkItemType":"Task",
        "System.Title":"TK 1.2.1 -  Example data",
        "System.AssignedTo":null,
        "System.State":"New",
        "System.Tags":null,
        "Parent":15887,
        "System.Description":"Example data",
        "System.AreaPath":"Example data"
    }
]

但是我正在嘗試構建以下結構：

Json格式的目標資料：

{
      "count": 36,
      "value": [
        {
          "id": 487,
          "rev": 1,
          "fields": {
            "System.AreaPath": "Example data",
            "System.TeamProject": "Example data",
            "System.IterationPath": "Example data",
            "System.WorkItemType": "Task",
            "System.State": "New",
            "System.Reason": "New",
            "System.CreatedDate": "2021-02-22T19:13:24.81Z",
            "System.CreatedBy": "Example data",
            "System.ChangedDate": "2021-02-22T19:13:24.81Z",
            "System.ChangedBy": "Example data",
            "System.Title": "Example data",
            "Microsoft.VSTS.Scheduling.Effort": 0.0,
            "System.Description": "Example data",
            "System.AssignedTo": null,
            "Microsoft.VSTS.Scheduling.RemainingWork": 0.0,
            "Microsoft.VSTS.Common.Priority": 2.0,
            "System.BoardLane": null,
            "System.Tags": null,
            "Microsoft.VSTS.TCM.Steps": null,
            "Microsoft.VSTS.TCM.Parameters": null,
            "Microsoft.VSTS.TCM.LocalDataSource": null,
            "Microsoft.VSTS.TCM.AutomationStatus": null,
            "System.History": null
          },
          "relations": [
            {
              "rel": "System.LinkTypes.Hierarchy-Reverse",
              "url": "Example data",
              "attributes": {
                "isLocked": "false",
                "name": "Parent"
              }
            }
          ],
          "url": "Example data"
        }
    ]
    }

如您所見，該陣列隨后被包裹在另一個具有“計數”和“值”的物件中。然后我的資料框存盤在第二張圖片中的“欄位”內，這是必需的。

Can anyone offer guidance here? I'm a bit stuck. If Pandas is not the correct tool please let me know. Please also provide the easiest solution as i'm still learning and would like to understand it.

Thank you in advance.

uj5u.com熱心網友回復：

您可以使用tranform_data如下函式來進行所需的額外轉換。

import pandas as pd


REVISION = 1


def exclude_keys(to_exclude: dict, *excluded_keys) -> dict:
    def predicate(key_val):
        key, val = key_val
        return key not in excluded_keys
    return dict(filter(predicate, to_exclude.items()))


def transform_data(to_transform: pd.DataFrame) -> dict:
    records = to_transform.to_dict("records")
    values = [
        {
            "id": record["ID"],
            "rev": REVISION,
            "fields": exclude_keys(record, "ID")
        }
        for record in records
    ]
    return {
        "count": len(records),
        "value": values
    }

呼叫transform_data應該有這個結果：

>>> transform_data(dftest)
{'count': 1,
 'value': [{'id': 15898,
   'rev': 1,
   'fields': {'System.WorkItemType': 'Task',
    'System.Title': 'TK 1.2.1 -  Example data',
    'System.AssignedTo': None,
    'System.State': 'New',
    'System.Tags': None,
    'Parent': 15887,
    'System.Description': 'Example data',
    'System.AreaPath': 'Example data'}}]}
>>> import json
>>> with open("~/path/to/output.json", "w") as fd: json.dump(transform_data(dftest), fd)

您應該能夠針對需要添加到資料中的任何轉換或附加資訊調整該代碼。

使用 Pandas 可能可以完成我對原始 Python 所做的事情，但據我所知，Pandas 最適合用于平面表格資料，而不是您需要的嵌套資料。Marshmallow 也可能值得一看，因為它可以很好地處理嵌套的 JSON：https ://marshmallow.readthedocs.io/en/stable/quickstart.html

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/412619.html

標籤：

上一篇：Pandas系列：僅保留包含給定字符（逗號）的第一個條目

下一篇：使用python從csv獲取指定開始日期和結束日期之間的日期范圍