我有一個資料框(非常大,數百萬行)。這是它的外觀:
id value
a1 0:0,1:10,2:0,3:0,4:7
b4 0:5,1:0,2:0,3:0,4:1
c5 0:0,1:3,2:2,3:0,4:0
k2 0:0,1:2,2:0,3:4,4:0
我想把這些字串變成字典,但只有那些沒有 0 的鍵值對。所以想要的結果是:
id value
a1 {1:10, 4:7}
b4 {4:1}
c5 {1:3, 2:2}
k2 {1:2}
怎么做?當我嘗試使用 dict() 函式但它帶來了 KeyError: 0:
df["value"] = dict(df["value"])
所以我首先把它變成字典有問題
我也試過這個:
df["value"] = json.loads(df["value"])
但它帶來了同樣的錯誤
uj5u.com熱心網友回復:
這可以解決問題,只需使用串列推導式:
import pandas as pd
dt = pd.DataFrame({"id":["a1", "b4", "c5", "k2"],
"value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})
def to_dict1(s):
return [dict([map(int, y.split(":")) for y in x.split(",") if "0" not in y.split(":")]) for x in s]
dt["dict"] = to_dict1(dt["value"])
獲得相同結果的另一種方法是使用正則運算式(模式(?!0{1})(\d)匹配任何數字,但單個 0):
import re
def to_dict2(s):
return [dict([map(int, y) for y in re.findall("(?!0{1})(\d):(?!0{1})(\d )", x)]) for x in s]
to_dict1根據我的測驗,在性能方面,快了近 20%。
uj5u.com熱心網友回復:
此代碼將產生您想要的結果。我按照您提供的示例輸入,并在最后列印了預期的結果。
import pandas as pd
df = pd.DataFrame(
{
'id': ['a1', 'b4', 'c5', 'k2'],
'value': ['0:0,1:10,2:0,3:0,4:7', '0:5,1:0,2:0,3:0,4:1', '0:0,1:3,2:2,3:0,4:0', '0:0,1:2,2:0,3:4,4:0']
}
)
value = [] # temporal value to save only key, value pairs without 0
for i, row in df.iterrows():
pairs = row['value'].split(',')
d = dict()
for pair in pairs:
k, v = pair.split(':')
k = int(k)
v = int(v)
if (k != 0) and (v != 0):
d[k] = v
value.append(d)
df['value'] = pd.Series(value)
print(df)
# id value
#0 a1 {1: 10, 4: 7}
#1 b4 {4: 1}
#2 c5 {1: 3, 2: 2}
#3 k2 {1: 2, 3: 4}
uj5u.com熱心網友回復:
def make_dict(row):
""" Requires string list of shape
["0":"0", "1":"10", ...]"""
return {key: val for key, val
in map(lambda x: map(int, x.split(":")), row)
if key != 0 and val != 0}
df["value"] = df.value.str.split(",").apply(make_dict)
uj5u.com熱心網友回復:
這就是我將如何做到的:
def string_to_dict(s):
d = {}
pairs = s.split(',') # get each key pair
for pair in pairs:
key, value = pair.split(':') # split key from value
if int(value): # skip the pairs with zero value
d[key] = value
return d
df['value'] = df['value'].apply(string_to_dict)
uj5u.com熱心網友回復:
使用字典理解來排除等于零的鍵或值項
txt="""id value
a1 0:0,1:10,2:0,3:0,4:7
b4 0:5,1:0,2:0,3:0,4:1
c5 0:0,1:3,2:2,3:0,4:0
k2 0:0,1:2,2:0,3:4,4:0 """
df = pd.DataFrame({"id":["a1", "b4", "c5", "k2"],
"value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})
for key,row in df.iterrows():
results=[]
{results.append({int(k),int(v)}) if int(k)!=0 and int(v)!=0 else None for k,v in (x.split(':') for x in row['value'].split(','))}
df.loc[key,'value']=results
print(df)
輸出:
id value
0 a1 [{1, 10}, {4, 7}]
1 b4 [{1, 4}]
2 c5 [{1, 3}, {2}]
3 k2 [{1, 2}, {3, 4}]
?
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/358008.html
上一篇:從陣列中獲取所有ID的引數
