我想根據 rel_list 添加到“關系”列的關系。具體來說,對于每個元組,即 ('a', 'b'),我想將第一行中的關系列值 '' 替換為 'b',但不要重復,這意味著對于第二行,不要用 'a' 替換 '',因為它們被認為是重復的。以下代碼不能完全正確地作業:
import pandas as pd
data = {
"names": ['a', 'b', 'c', 'd'],
"ages": [50, 40, 45, 20],
"relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]
df = pd.DataFrame(data)
for rel_tuple in rel_list:
head = rel_tuple[0]
tail = rel_tuple[1]
df.loc[df.names == head, 'relations'] = tail
print(df)
df的當前結果是:
names ages relations
0 a 50 c
1 b 40
2 c 45 d
3 d 20
然而,正確的是:
names ages relations
0 a 50 b
0 a 50 c
1 b 40
2 c 45 d
3 d 20
有需要添加的新行。在這種情況下的第二行,如上。怎么做?
uj5u.com熱心網友回復:
您可以制作一個資料框和merge:
(df.drop('relations', axis=1)
.merge(pd.DataFrame(rel_list, columns=['names', 'relations']),
on='names',
how='outer'
)
# .fillna('') # uncomment to replace NaN with empty string
)
輸出:
names ages relations
0 a 50 b
1 a 50 c
2 b 40 NaN
3 c 45 d
4 d 20 NaN
uj5u.com熱心網友回復:
您可以創建一個新的并逐行添加關系,而不是更新 df:
import pandas as pd
data = {
"names": ['a', 'b', 'c', 'd'],
"ages": [50, 40, 45, 20],
"relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]
df = pd.DataFrame(data)
new_df = pd.DataFrame(data)
new_df.loc[:, 'relations'] = ''
for head, tail in rel_list:
new_row = df[df.names == head]
new_row.loc[:,'relations'] = tail
new_df = new_df.append(new_row)
print(new_df)
輸出:
names ages relations
0 a 50
1 b 40
2 c 45
3 d 20
0 a 50 b
0 a 50 c
2 c 45 d
然后,如果需要,最后您可以洗掉“關系”中沒有值的所有行:
new_df = new_df[new_df['relations']!='']
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/349856.html
