我有一個包含一列(col)的資料框。例如,我正在嘗試洗掉重復記錄,無論是小寫還是大寫
df = pd.DataFrame({'Col': ['Appliance Identification', 'Natural Language','Social networks',
'natural language', 'Personal robot', 'Social Networks', 'Natural language']})
輸出:
Col
0 Appliance Identification
1 Natural Language
2 Social networks
3 natural language
4 Personal robot
5 Social Networks
6 Natural language
預期輸出:
Col
0 Appliance Identification
1 Social networks
2 Personal robot
3 Natural language
無論大小寫不敏感,如何進行此 Dropping 操作?
uj5u.com熱心網友回復:
你可以使用:
df.groupby(df['Col'].str.lower(), as_index=False, sort=False).first()
輸出:
Col
0 Appliance Identification
1 Natural Language
2 Social networks
3 Personal robot
uj5u.com熱心網友回復:
將值轉換為小寫并通過inSeries.duplicated使用 invert mask過濾重復項:~boolean indexing
df = df[~df['Col'].str.lower().duplicated()]
print (df)
Col
0 Appliance Identification
1 Natural Language
2 Social networks
4 Personal robot
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/424784.html
