我正在嘗試創建一個 .csv,其中包含新舊 csv 檔案之間不同的記錄。我已經使用一對這樣的配對成功地完成了這一點
old_df = 'file1_old.csv'
new_df = 'file1_new.csv'
df1 = pd.read_csv(old_df)
df2 = pd.read_csv(new_df)
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']) keep=False)
dups_dropped.to_csv('difference.csv', index=False)
如果新的舊檔案名輸入具有相同的約定,我正在努力思考如何使用回圈(?)為每個新配對輸出一個 csv,例如:
file1_new, file1_new
file2_new, file2_old
file3_new, file3_old
所以輸出是
file1_difference.csv
file2_difference.csv
file3_difference.csv
想法?非常感激
uj5u.com熱心網友回復:
使用帶有 f 字串的簡單 for 回圈來幫助格式化檔案名應該可以作業:
for i in range(1,11): # replace 11 with the number of files you have 1
old_df = f'file{i}_old.csv'
new_df = f'file{i}_new.csv'
df1 = pd.read_csv(old_df)
df2 = pd.read_csv(new_df)
df1['flag'] = 'old'
df2['flag'] = 'new'
df = pd.concat([df1, df2])
dups_dropped = df.drop_duplicates(df.columns.difference(['flag']) keep=False)
dups_dropped.to_csv(f'difference{i}.csv', index=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/422127.html
標籤:
