我是熊貓的新手,我有一個問題。
我有一個像
Code Keywords
A Real estate, loan, building, office, land, warehouse
B Real Estate Lease , Real Estate, building, Office, Warehouse, rental, Tenant, broker advisor, Real Estate Lease , Lease and rent
C Transport Air freight, shift, cargo, truck, insurance, Transport Insurance, Transport
D Transport, shift, cargo, truck, insurance, Transport Insurance
并且我應該洗掉“關鍵字”列上的重復項,無論重復項是在同一行還是在 3 個不同的行。無論是寫“倉庫”還是“倉庫”,所有重復的值都被洗掉
結果應如下所示:
Code Keywords
A loan, land
B Real Estate Lease, rental, Tenant, broker advisor, Real Estate Lease , Lease and rent
C Transport Air freight
D
例如,列“D”根本沒有關鍵字,因為它們在其他行上都有重復
謝謝
uj5u.com熱心網友回復:
使用pandas.Series.str.splitwith 的一種方式explode:
m = df["Keywords"].str.split("\s*,\s*").explode()
m = m[~m.str.lower().duplicated(False)]
df["Keywords"] = m.groupby(m.index).apply(", ".join)
df = df.fillna("")
輸出:
Code Keywords
0 A loan, land
1 B rental, Tenant, broker advisor, Lease and rent
2 C Transport Air freight
3 D
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/345076.html
上一篇:pandas.DataFrame中的groupby操作沒有例外值
下一篇:查找和替換熊貓只索引每一行
