在Python中將JSON檔案扁平化為PandasDataframe-有解無憂

我有這種格式的json：

{
    "fields": {
        "tcidte": {
            "mode": "required",
            "type": "date",
            "format": "%Y%m%d"
        },
        "tcmcid": {
            "mode": "required",
            "type": "string"
        },
        "tcacbr": {
            "mode": "required",
            "type": "string"
        }
    }
}

我希望它采用資料幀格式，其中三個欄位名稱中的每一個都是單獨的行。如果一行有一個列（例如“格式”），而其他列是空白的，則應假定為 NULL。

我嘗試使用我在此處找到的 flatten_json 函式，但沒有按預期作業，但仍將包括在此處：

def flatten_json(nested_json, exclude=['']):
    """Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
            exclude: Keys to exclude from output.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude: flatten(x[a], name   a   '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name   str(i)   '_')
                i  = 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

flatten_json_file = pd.DataFrame(flatten_json(nested_json))
pprint.pprint(flatten_json_file)

額外的復雜性 JSON：

{
    "fields": {
        "action": {
            "type": {
                "field_type": "string"
            },
            "mode": "required"
        },
        "upi": {
            "type": {
                "field_type": "string"
            },
            "regex": "^[0-9]{9}$",
            "mode": "required"
        },
        "firstname": {
            "type": {
                "field_type": "string"
            },
            "mode": "required"
        }
    }
}

uj5u.com熱心網友回復：

和

data = {
    "fields": {
        "tcidte": {
            "mode": "required",
            "type": "date",
            "format": "%Y%m%d"
        },
        "tcmcid": {
            "mode": "required",
            "type": "string"
        },
        "tcacbr": {
            "mode": "required",
            "type": "string"
        }
    }
}

這個

df = pd.DataFrame(data["fields"].values())

結果是

       mode    type  format
0  required    date  %Y%m%d
1  required  string     NaN
2  required  string     NaN

那是你的目標嗎？

如果你想要data["fields"]as 索引的鍵：

df = pd.DataFrame(data["fields"]).T

或者

df = pd.DataFrame.from_dict(data["fields"], orient="index")

兩者都導致

            mode    type  format
tcidte  required    date  %Y%m%d
tcmcid  required  string     NaN
tcacbr  required  string     NaN

和

data = {
    "fields": {
        "action": {
            "type": {
                "field_type": "string"
            },
            "mode": "required"
        },
        "upi": {
            "type": {
                "field_type": "string"
            },
            "regex": "^[0-9]{9}$",
            "mode": "required"
        },
        "firstname": {
            "type": {
                "field_type": "string"
            },
            "mode": "required"
        }
    }
}

你可以做

data = {key: {**d, **d["type"]} for key, d in data["fields"].items()}
df = pd.DataFrame.from_dict(data, orient="index").drop(columns="type")

或者

df = pd.DataFrame.from_dict(data["fields"], orient="index")
df = pd.concat(
    [df, pd.DataFrame(df.type.to_list(), index=df.index)], axis=1
).drop(columns="type")

結果如（列位置可能不同）

               mode field_type       regex
action     required     string         NaN
upi        required     string  ^[0-9]{9}$
firstname  required     string         NaN

uj5u.com熱心網友回復：

df= pd.read_json('test.json')
df_fields = pd.DataFrame(df['fields'].values.tolist(), index=df.index)
print(df_fields)

輸出：

            mode    type  format
tcacbr  required  string     NaN
tcidte  required    date  %Y%m%d
tcmcid  required  string     NaN

uj5u.com熱心網友回復：

一種選擇是jmespath庫，它在以下場景中很有用：

# pip install jmespath
import jmespath
import pandas as pd

# think of it like a path 
# fields is the first key
# there are sub keys with varying names
# we are only interested in mode, type, format
# hence the * to represent the intermediate key(s)
expression = jmespath.compile('fields.*[mode, type, format]')

pd.DataFrame(expression.search(data), columns = ['mode', 'type', 'format'])

       mode    type  format
0  required    date  %Y%m%d
1  required  string    None
2  required  string    None

jmespath 有很多工具；然而，這應該足夠了，并且涵蓋了子詞典中缺少鍵（模式、型別、格式）的情況。

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/376144.html

標籤：Python json 熊猫数据框 json-flattener

上一篇：應用格式后，Procjson會產生額外的空白

下一篇：如何讓html在異步等待請求后等待運行