當前正在查找重復項,但資料未顯示行號、名稱和編號并且未正確輸出(有關預期輸出,請參見下文)。
以下是示例檔案(已編輯鏈接):
當前結果

當前卡住的代碼
https://pastebin.com/ypeGAQLQ
import pandas as pd
import os
df_state=pd.read_csv(r'D:\NP Year 3\CNETF\Labs\Lab 2\A2c Python Exercises\2 Duplicated\names_dup2.csv', quoting=1, header=None)[0] \
.str.split('\t', expand=True) \
.duplicated() \
.to_csv('D:\\NP Year 3\\CNETF\\Labs\\Lab 2\\A2c Python Exercises\\2 Duplicated\\duplicated_data.csv', index=False, header=False)
print(df_state)
uj5u.com熱心網友回復:
發生這種情況是因為.duplicated回傳一個布爾系列(真/假),您直接保存。
但是您應該使用它來對資料進行子集化,如下所示:
import pandas as pd
import os
df_state = pd.DataFrame(
[["3 Liu Yu,876"],
["4 Koh chong,123"],
["3 Liu Yu,876"]])
df_state = df_state[0].str.split(" ", expand= True)
print(df_state, "\n")
duplicated = df_state.duplicated() # just a boolean series
print(duplicated, "\n")
print(df_state[duplicated], "\n") ## <- subset and save with .to_csv
# as Anders K?llmar points out, you can also do this:
all_duplicated = df_state.duplicated(keep= False)
print(df_state[all_duplicated])
輸出:
0 1 2
0 3 Liu Yu,876
1 4 Koh chong,123
2 3 Liu Yu,876
0 False
1 False
2 True
dtype: bool
0 1 2
2 3 Liu Yu,876
0 1 2
0 3 Liu Yu,876
2 3 Liu Yu,876
uj5u.com熱心網友回復:
使用df.duplicatedwithkeep=False獲取重復行的布爾掩碼,然后提取行:
# split name / number from your csv file
df = pd.read_csv('names_dup2.csv', quoting=1, header=None)[0] \
.str.split('\t', expand=True)
# increment index to match line number
df.index = 1
# keep duplicate entries
out = df[df[0].duplicated(keep=False)]
# export to duplicated_data.csv
out.to_csv('duplicated_data.csv', header=False)
輸出檔案內容:
15,ANDREW ZHAO CHONG,83091746
19,ANDREW ZHAO CHONG,83091746
26,ANDREW ZHAO CHONG,83091746
48,ANDREW ZHAO CHONG,83091746
53,KOH KANG RI,89943392
56,KOH KANG RI,89943392
63,ENOS ZHAO KANG SONG,80746554
66,ENOS ZHAO KANG SONG,80746554
80,ENOS ZHAO KANG SONG,80746554
一行版本
pd.read_csv('names_dup2.csv', quoting=1, header=None)[0] \
.str.split('\t', expand=True) \
.assign(index=lambda x: x.index 1) \
.set_index('index') \
[lambda x: x[0].duplicated(keep=False)] \
.to_csv('duplicated_data.csv', header=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/329522.html
