如何合并 2 個以上的檔案,例如看起來像這些
第一個csv檔案:
email,joe,@gmail.com
email,doe,@hotmail.com
name,emilly,doe
name,jenny,van
year,talia,19
year,kevin,20
第二個csv檔案:
email,joe,mr
email,doe,mrs
name,jenny,gogh
year,talia,97
我想合并這些檔案看起來像這樣:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan
任何幫助,將不勝感激
uj5u.com熱心網友回復:
使用DataFrame.merge與左或默認內部聯接:
#convert files to DataFrames, if no header added header=None
df1 = pd.read_csv(file1, header=None)
df2 = pd.read_csv(file2, header=None)
#left join by first 2 columns
df = df1.merge(df2, on=[0,1], how='left')
print (df)
0 1 2_x 2_y
0 email joe @gmail.com mr
1 email doe @hotmail.com mrs
2 name emilly doe NaN
3 name jenny van gogh
4 year talia 19 97
5 year kevin 20 NaN
如果需要跳過值:
#inner join by first 2 columns
df = df1.merge(df2, on=[0,1])
print (df)
0 1 2_x 2_y
0 email joe @gmail.com mr
1 email doe @hotmail.com mrs
2 name jenny van gogh
3 year talia 19 97
#write to file
df.to_csv(file3, index=False, header=False)
uj5u.com熱心網友回復:
更新:
pd.merge(df1, df2, on=[0, 1], how='outer') \
.to_csv('output.csv', index=False, header=False, na_rep='nan')
# Content of file:
email,joe,@gmail.com,mr
email,doe,@hotmail.com,mrs
name,emilly,doe,nan
name,jenny,van,gogh
year,talia,19,97
year,kevin,20,nan
更新
如何合并2個以上的csv檔案?我也可以對 3 個 csv 檔案使用 merge() 嗎?
我將您的第二個檔案分成兩部分:
# data1.csv
email,joe,@gmail.com
email,doe,@hotmail.com
name,emilly,doe
name,jenny,van
year,talia,19
year,kevin,20
# data2.csv
email,joe,mr
email,doe,mrs
# data3.csv
name,jenny,gogh
year,talia,97
reduce從functools模塊使用:
filenames = ['data1.csv', 'data2.csv', 'data3.csv']
dfs = [pd.read_csv(fn, header=None ) for fn in filenames]
df = reduce(lambda df1, df2: pd.merge(df1, df2, on=[0, 1], how='outer'), dfs)
df.to_csv('output.csv', index=False, header=False, na_rep='nan')
輸出:
email,joe,@gmail.com,mr,nan
email,doe,@hotmail.com,mrs,nan
name,emilly,doe,nan,nan
name,jenny,van,nan,gogh
year,talia,19,nan,97
year,kevin,20,nan,nan
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/376569.html
