我有以下 JSON 結構:
{
"comments_v2": [
{
"timestamp": 1196272984,
"data": [
{
"comment": {
"timestamp": 1196272984,
"comment": "OSI Beach Party Weekend, CA",
"author": "xxxx"
}
}
],
"title": "xxxx commented on his own photo."
},
{
"timestamp": 1232918783,
"data": [
{
"comment": {
"timestamp": 1232918783,
"comment": "We'll see about that.",
"author": "xxxx"
}
}
]
}
]
}
我正在嘗試將這個 JSON 壓縮成一個 Pandas 資料框,這是我的解決方案:
# Read file
df = pd.read_json(codecs.open(infile, "r", "utf-8-sig"))
# Normalize
df = pd.json_normalize(df["comments_v2"])
child_column = pd.json_normalize(df["data"])
child_column = pd.concat([child_column.drop([0], axis=1), child_column[0].apply(pd.Series)], axis=1)
df_merge = df.join(child_column)
df_merge.drop(["data"], axis=1, inplace=True)
結果資料框如下:
| 時間戳 | 標題 | 評論.時間戳 | 評論.評論 | 評論作者 | 評論組 |
|---|---|---|---|---|---|
| 1196272984 | xxxx 評論了自己的照片 | 1196272984 | 加利福尼亞州 OSI 海灘派對周末 | XXXXX | NaN |
有沒有更簡單的方法來扁平化 JSON 以獲得上面顯示的結果?
謝謝!
uj5u.com熱心網友回復:
使用record_path='data'作為引數pd.json_normalize:
import json
import codecs
with codecs.open(infile, 'r', 'utf-8-sig') as jsonfile:
data = json.load(jsonfile)
df = pd.json_normalize(data['comments_v2'], 'data')
輸出:
>>> df
comment.timestamp comment.comment comment.author
0 1196272984 OSI Beach Party Weekend, CA xxxx
1 1232918783 We'll see about that. xxxx
uj5u.com熱心網友回復:
嘗試 flatten_json (在本例中將 json 設定為 js)
from flatten_json import flatten^M
dic_flattened = (flatten(d, '.') for d in list(js['comments_v2']))^M
df = pd.DataFrame(dic_flattened)^M
df
timestamp data.0.comment.timestamp data.0.comment.comment data.0.comment.author title
0 1196272984 1196272984 OSI Beach Party Weekend, CA xxxx xxxx commented on his own photo.
1 1232918783 1232918783 We'll see about that. xxxx NaN
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/353747.html
標籤:Python json 熊猫 json-flattener
