我正在使用 Pandas 庫在 Python 中的資料框中提取 JSON 陣列列,其中我有這樣的資料
>df
id partnerid payments
5263 org1244 [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264 org1245 [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265 org1246 [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]
我想提取串列中的 JSON 資料并將其添加為同一資料框中的列,如下所示
>mod_df
id partnerid sNo amount
5263 org1244 1 1000
5263 org1244 2 500
5264 org1245 1 2000
5264 org1245 2 600
5265 org1246 1 3000
5265 org1246 2 700
我已經嘗試過這種方法
import pandas as pd
import json as j
df = pd.read_parquet('sample.parquet')
js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)
但這有效,只有當我們在列中有 JSON 資料而不是 JSON 串列時。有人可以解釋一下,我怎樣才能達到我想要的輸出?
uj5u.com熱心網友回復:
將其轉換為listbyast.literal_eval并用于explode()將每個元素轉換為一行并復制其他列。
然后,用于.apply(pd.Series)將 dict-like 轉換為series.
最后,使用pd.concat().
例子:
import ast
# sample data
d = {'col1': [0, 1, 2], 'payments': ['[{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]', '[{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]', '[{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]']}
df = pd.DataFrame(data=d, index=[0, 1, 2])
df['payments'] = df['payments'].apply(ast.literal_eval)
df = df.explode('payments')
out = pd.concat([df.drop(['payments'], axis=1), df['payments'].apply(pd.Series)], axis=1).reset_index(drop=True)
輸出:
col1 sNo amount 0 0 1 1000 1 0 2 500 2 1 1 2000 3 1 2 600 4 2 1 3000 5 2 2 700
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/437370.html
