我有一個 DataFrame,它有一個 json 陣列作為一列的值。我想選擇其中一個作為列的值并擺脫其余的。我已將所需的值放入一個系列中,但我不知道如何將它們連接回 DataFrame 以代替現有列:
import json
from pandas.io.json import json_normalize
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0}, "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Bob"}',
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Carol"}'
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
lst = []
for d in df['data']:
d = pd.read_json(d, lines=True)['uid'].values[0]
lst.append(d)
s = pd.DataFrame(lst)
df['data'] = s
print(s)
print(df)
回傳
0
0 100
1 100
2 200
3 300
bank_account data
Alice 101 NaN
Alice 102 NaN
Bob 201 NaN
Carol 301 NaN
目前,我不知道為什么資料列顯示所有 nan 值。任何幫助表示贊賞。
更新問題:某些行包含 json 陣列串列,而不僅僅是一個。這是我到目前為止所擁有的:
import json
from pandas.io.json import json_normalize
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'[{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"},{"uid": 150, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "jer"}]',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0}, "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Bob"}',
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Carol"}'
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
# df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])
df["data"] = df["data"].apply(lambda array : (",".join(list(map(lambda x : pd.read_json(x, lines=True)["uid"][0], array),(df['data'])))))
print(df)
uj5u.com熱心網友回復:
這對我有用:
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0}, "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Bob"}',
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Carol"}'
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])
您的代碼不起作用,因為df并且s具有不同的索引。如果您想在兩個列印陳述句之前修復您的代碼集df['data'] = s[0].values(而不是)。df['data'] = s
uj5u.com熱心網友回復:
正如@rachwa 所指出的,問題在于索引不匹配,因為索引s是數字,而索引df是名稱。如果您lst直接分配而不是將其轉換為 DataFrame,您將獲得所需的結果,即
df['data'] = lst
會按預期作業。
您也可以使用json.loads代替read_json(它應該更快):
import json
df['data'] = [json.loads(d)['uid'] for d in df['data']]
輸出:
bank_account data
Alice 101 100
Alice 102 100
Bob 201 200
Carol 301 300
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/437361.html
