我有 3 個資料框(df1, df2, df3),我想基于一列合并這些資料框并添加兩個新列。一列應該說明哪些資料框匹配,第二列應該說明其中有多少匹配。
# df1
data = {'ID': ["M1", "M2", "M3", "M4"],
'Movie': ["Top gun", "Thor", "Batman", "MadMax"],
'Actor' : ["Tom", "Chris", "Bale", "Tom"],
'type': ["Action", "SciFi", "Comic", "SciFi"]}
df1 = pd.DataFrame(data)
# df2
data = {'ID': ["M1", "M2", "M3"],
'highlight': ["Flight school", "Love and thunder", "I am Batman"]}
df2 = pd.DataFrame(data)
# df3
data = {'ID': ["M2", "M3"],
'no of parts': [3, 3],
'co-star' : ["portman", "neeson"],
'award': ["yes", "yes"]}
df3 = pd.DataFrame(data)
預期輸出將是

和是新match列no of match
感謝您的時間
任何幫助將非常感激
uj5u.com熱心網友回復:
您可以merge將三個資料框放在 上ID,然后使用該indicator引數來merge確定哪些資料框具有有效資料,并使用此資訊生成match列。然后,您可以計算其中的|字符數match以確定No of match列:
import pandas as pd
data = {'ID': ["M1", "M2", "M3", "M4"], 'Movie': ["Top gun", "Thor", "Batman", "MadMax"], 'Actor' : ["Tom", "Chris", "Bale", "Tom"], 'type': ["Action", "SciFi", "Comic", "SciFi"]}
df1 = pd.DataFrame(data)
data = {'ID': ["M1", "M2", "M3"], 'highlight': ["Flight school", "Love and thunder", "I am Batman"]}
df2 = pd.DataFrame(data)
data = {'ID': ["M2", "M3"], 'no of parts': [3, 3], 'co-star' : ["portman", "neeson"], 'award': ["yes", "yes"]}
df3 = pd.DataFrame(data)
df = df1.merge(df2, on='ID', how='left', indicator='df1df2').merge(df3, on='ID', how='left',indicator='df3')
df['match'] = df['df1df2'].map({'both':'df1|df2', 'left_only':'df1'}) df['df3'].map({'both':'|df3', 'left_only':''})
df['No of match'] = df['match'].str.count('\|') 1
df = df.drop(['df1df2', 'df3'], axis=1)
輸出:
ID Movie Actor type highlight no of parts co-star award match No of match
0 M1 Top gun Tom Action Flight school NaN NaN NaN df1|df2 2
1 M2 Thor Chris SciFi Love and thunder 3.0 portman yes df1|df2|df3 3
2 M3 Batman Bale Comic I am Batman 3.0 neeson yes df1|df2|df3 3
3 M4 MadMax Tom SciFi NaN NaN NaN NaN df1 1
uj5u.com熱心網友回復:
你也可以試試這個;與 reduce-lambda 合并
dfs = [df1, df2, df3]
df_final = reduce(lambda left,right: pd.merge(left,right,on='ID',how='outer'), dfs)
df_temp = df_final[[df1.columns[1],df2.columns[1],df3.columns[1]]]
df_final["match"] = df_temp.apply(lambda x: "|".join(["df" str(idx 1) for idx,i in enumerate(x) if pd.isna(i)==False]),axis=1)
df_final["No of match"] = df_final["match"].apply(lambda x: x.count("|") 1)
輸出;
ID Movie Actor type ... co-star award match No of match
0 M1 Top gun Tom Action ... NaN NaN df1|df2 2
1 M2 Thor Chris SciFi ... portman yes df1|df2|df3 3
2 M3 Batman Bale Comic ... neeson yes df1|df2|df3 3
3 M4 MadMax Tom SciFi ... NaN NaN df1 1
uj5u.com熱心網友回復:
您可以pandas.concat在輸入 DafaFrames 的串列中使用。這將適用于任意數量的輸入 DataFrame(不僅僅是 3 個):
# dataframes will be later named in order: 1->2->3
# you can easily tweak this solution to use a dictionary
# and custom names if desired
dfs = [df1, df2, df3]
out = (pd
.concat([d.set_index('ID').assign(ID=f'df{i}')
for i,d in enumerate(dfs, start=1)], axis=1)
.assign(**{'match': lambda d: d[['ID']].agg(lambda x: '|'.join(x.dropna()),
axis=1),
'No of matches': lambda d: d[['ID']].notna().sum(axis=1)
})
.drop('ID', axis=1).reset_index()
)
注意。這種方法使用一個臨時ID列,確保它不存在于任何輸入 DataFrame 的列中。如果需要,您可以選擇其他名稱以確保安全。
輸出:
ID Movie Actor type highlight no of parts co-star award match No of matches
0 M1 Top gun Tom Action Flight school NaN NaN NaN df1|df2 2
1 M2 Thor Chris SciFi Love and thunder 3.0 portman yes df1|df2|df3 3
2 M3 Batman Bale Comic I am Batman 3.0 neeson yes df1|df2|df3 3
3 M4 MadMax Tom SciFi NaN NaN NaN NaN df1 1
uj5u.com熱心網友回復:
可能不是最好的方法,但你可以試試這個:
import numpy as np
def f(x):
if (str(x[1])== 'nan') & (str(x[2])== 'nan') & (str(x[3])== 'nan') :
if len(x[0]) ==1:
return (['df1'])
if len(x[0]) ==2:
return (['df1','df2'])
else:
return (['df1','df2','df3'])
df = df1.merge(df2,how='outer')
df['match'] = df['highlight'].apply(lambda x: ['df1','df2'] if str(x)!= 'nan' else ['df1'])
df = df.merge(df3,how='outer')
df['match'] = df[['match','no of parts','co-star','award']].apply(f,axis=1)
df['No of match'] = df['match'].apply(lambda x:len(x))
df['match'] = df['match'].apply(lambda x:'|'.join(x))
df = df[['ID', 'Movie', 'Actor', 'type', 'highlight', 'no of parts',
'co-star', 'award','match', 'No of match']]
uj5u.com熱心網友回復:
DataFrame.merge與 left join 和 indicator 一起使用parametersee matched DataFrames,然后DataFrame.pop用于洗掉列并處理 by用于Series.map字典,附加df3列并映射另一個字典和最后計數|:Series.str.count
df = (df1.merge(df2, on='ID', how='left', indicator='df2')
.merge(df3, on='ID', how='left', indicator='df3'))
df['match'] = (df.pop('df2').map({'both':'df1|df2', 'left_only':'df1'})
df.pop('df3').map({'both':'|df3', 'left_only':''}))
df['No of match'] = df['match'].str.count('\|') 1
print (df)
ID Movie Actor type highlight no of parts co-star award \
0 M1 Top gun Tom Action Flight school NaN NaN NaN
1 M2 Thor Chris SciFi Love and thunder 3.0 portman yes
2 M3 Batman Bale Comic I am Batman 3.0 neeson yes
3 M4 MadMax Tom SciFi NaN NaN NaN NaN
match No of match
0 df1|df2 2
1 df1|df2|df3 3
2 df1|df2|df3 3
3 df1 1
對于由 s 填充的concat分配輔助列和匹配列使用的另一個想法:df1-df3TrueDataFrame.dot
dfs = [df1, df2, df3]
L = [df_.set_index('ID').assign(**{f'df{i}':True}) for i, df_ in enumerate(dfs, start=1)]
df = pd.concat(L, axis=1)
cols = df.filter(regex='^df\d').columns
df['match'] = df[cols].fillna(False).dot(cols '|').str[:-1]
df['No of match'] = df['match'].str.count('\|') 1
df = df.drop(cols, axis=1)
print (df)
Movie Actor type highlight no of parts co-star award \
ID
M1 Top gun Tom Action Flight school NaN NaN NaN
M2 Thor Chris SciFi Love and thunder 3.0 portman yes
M3 Batman Bale Comic I am Batman 3.0 neeson yes
M4 MadMax Tom SciFi NaN NaN NaN NaN
match No of match
ID
M1 df1|df2 2
M2 df1|df2|df3 3
M3 df1|df2|df3 3
M4 df1 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/514028.html
上一篇:如何根據條件向資料框添加列
