下面的資料框有混合型別的列。擴展感興趣的列是“資訊”。此列中的每一行值都是一個 JSON 物件。
data = {'Code':['001', '002', '003', '004'],
'Info':['{"id":001,"x_cord":[1,1,1,1],"x_y_cord":[4.703978,-39.601876],"neutral":1,"code_h":"S38A46","group":null}','{"id":002,"x_cord":[2,1,3,1],"x_y_cord":[1.703978,-38.601876],"neutral":2,"code_h":"S17A46","group":"New"}','{"id":003,"x_cord":[1,1,4,1],"x_y_cord":[112.703978,-9.601876],"neutral":4,"code_h":"S12A46","group":"Old"}','{"id":004,"x_cord":[2,1,7,1],"x_y_cord":[6.703978,-56.601876],"neutral":1,"code_h":"S12A46","group":null}'],
'Region':['US','Pacific','Africa','Asia']}
df = pd.DataFrame(data)
我想擴展標題,即將“Info.id”、“info.x_y_cord”、“info.neutral”等作為單獨的列,在資料集下具有相應的值。我嘗試通過 pd.json_normalize(df["Info"]) 迭代對它們進行規范化,但似乎沒有任何改變。我需要先將列轉換為另一種型別嗎?有人可以指出我正確的方向嗎?
輸出應該是這樣的:
data1 = {'Code':['001', '002', '003', '004'],
'Info.id':['001','002','003','004'],
'Info.x_cord':['[1,1,1,1]','[2,1,3,1]','[1,1,4,1]','[2,1,7,1]'],
'Info.x_y_cord':['[4.703978,-39.601876]','[1.703978,-38.601876]','[112.703978,-9.601876]','[6.703978,-56.601876]'],
'Info.neutral':[1,2,4,1],
'Info.code_h':['S38A46','S17A46','S12A46','S12A46'],
'Info.group':[np.NaN,"New","Old",np.NaN],
'Region':['US','Pacific','Africa','Asia']}
df_final = pd.DataFrame(data1)
uj5u.com熱心網友回復:
首先,由于 ID 值,您的 JSON 字串似乎無效。001未正確處理,因此您需要將“id”值作為字串傳遞。這是一種方法:
def id_as_string(matchObj):
# Adds " around the ID value
return f"\"id\":\"{matchObj.group(1)}\","
df["Info"] = df["Info"].str.replace("\"id\":(\d*),", repl=id_to_string, regex=True))
完成此操作后,您可以在使用pd.json_normalize以下命令從 JSON 字串加載值后在“資訊”列上使用json.loads:
import json
json_part_df = pd.json_normalize(df["Info"].map(json.loads))
之后,只需重命名列并用于pd.concat形成輸出資料框:
# Rename columns
json_part_df.columns = [f"Info.{column}" for column in json_part_df.columns]
# Use pd.concat to create output
df = pd.concat([df[["Code", "Region"]], json_part_df], axis=1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/446390.html
