我有一個資料框,其中一列的行是 json,只要洗掉了特定的鍵,我就能夠正確決議它們。
id | email | phone no | details
-------------------------------------------------
0 10 | [email protected] | 123 | {"a" : "hello", "b" : {"x": "whatever"....}, "c": "check"}
1 12 | [email protected] | 789 | {"a" : "bye", "b" : {"x": "ignore"....}, "c": "cool"}
列詳細資訊有一個名為 - "b" 的鍵,其中包含許多鍵值對,其中一些已損壞,因為缺少逗號或引號。我不在乎它,因為我不需要它。我可以洗掉 JSON 的那部分嗎?
我想要它如下:
id | email | phone no | details
-------------------------------------------------
0 10 | [email protected] | 123 | {"a" : "hello", "c": "check"}
1 12 | [email protected] | 789 | {"a" : "bye", "c": "cool"}
我需要將該鍵/值詳細地吐到行和列中以獲取“詳細資訊”,如果我洗掉那個損壞的鍵,我會這樣做。我有數百萬條記錄,因此我需要一種忽略“詳細資訊”列中所有行的鍵的方法。
謝謝。
uj5u.com熱心網友回復:
嘗試使用正則運算式str.replace:
PAT = re.compile(r',\s*"b"\s*:\s*{.*?}\s*,\s*')
df['details'] = df['details'].str.replace(PAT, ', ')
print(df)
# Output:
id email phone no details
0 10 [email protected] 123 {"a" : "hello", "c": "check"}
1 12 [email protected] 789 {"a" : "bye", "c": "cool"}
uj5u.com熱心網友回復:
這是粗俗和丑陋的,但如果“b”總是包含字典,或者至少花括號填充了不是右花括號的東西,那么它可能會起作用:
import re
import json
# attempt to delete dictionaries associated with "b" key:
fixed = [re.sub(r'"b" ?: ?{[^}] }, ?', '', s) for s in df['details']]
try:
# test for valid JSON
[json.loads(f) for f in fixed]
df['details'] = fixed
except json.JSONDecodeError:
print('whoops, this ugly hack failed')
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/376080.html
