我有以下資料框,我試圖在其中匹配帳戶代碼。假設列 Account_Spread_v2 和 Account_Codes_v2 已合并到資料框中。這個想法是將 Account_Codes_v2 列與 Account_Codes 相匹配。請參閱下面的功能以應用此功能。
df = pd.DataFrame([[31,1234567890,'USD',3.5,'D12',3.5,'D3'],
[10,7854567890,'USD',2.7,'TT',2.7,'TT'],
[10,7854567899,'AUS',8,'D1',8,'D1'],
[6,7854567893,'USD',2.7,'D55',2.7,'H1'],
[10,7854567893,'EUR',2.7,'JG',2.7,'JG'],
[31,9632587415,'USD',1.4,'D55',1.4,'D2']],
columns = ['branch','Account','Cur','Account_Spread','Account_Codes','Account_Spread_v2','Account_Codes_v2'])
輸出:
branch Account Cur Account_Spread Account_Codes Account_Spread_v2 Account_Codes_v2
0 31 1234567890 USD 3.5 D12 3.5 D3
1 10 7854567890 USD 2.7 TT 2.7 TT
2 10 7854567899 AUS 8.0 D1 8.0 D1
3 6 7854567893 USD 2.7 D55 2.7 H1
4 10 7854567893 EUR 2.7 JG 2.7 JG
5 31 9632587415 USD 1.4 D55 1.4 D2
功能:
def compute_match_codes(row):
codes = ['D1','D2','D4','D3']
m = 'NA'
if row['Account_Codes'] == row['Account_Codes_v2']:
m = 'MatchOnCodes'
else:
m = 'MismatchOnCodes'
return(m)
df = (pd.concat([df,(df.apply(compute_match_codes, axis=1, result_type='expand')),], axis=1))
branch Account Cur Account_Spread Account_Codes Account_Spread_v2 Account_Codes_v2 0
0 31 1234567890 USD 3.5 D12 3.5 D3 MismatchOnCodes
1 10 7854567890 USD 2.7 TT 2.7 TT MatchOnCodes
2 10 7854567899 AUS 8.0 D1 8.0 D1 MatchOnCodes
3 6 7854567893 USD 2.7 D55 2.7 H1 MismatchOnCodes
4 10 7854567893 EUR 2.7 JG 2.7 JG MatchOnCodes
5 31 9632587415 USD 1.4 D55 1.4 D2 MismatchOnCodes
我面臨的挑戰是,如果一個賬戶是USD,在分行31并且它的賬戶代碼是“ Account_Codes ”列中的D12和D55,它可以替代名為“代碼”的串列中的任何代碼。
通過應用這一行,第 0 行和第 5 行將實際匹配。我嘗試使用 isin() 方法,但沒有奏效。關于如何編輯函式以適應這個的任何想法?
uj5u.com熱心網友回復:
我會先使用嵌套np.where()來消除所有完全匹配,然后再解決您需要的更復雜的邏輯。我相信這也是一個更快的解決方案,因為它的矢量化比使用applywithconcat和自定義函式更快。代碼如下所示:
codes = ['D1','D2','D3','D4']
df['Match'] = np.where(df['Account_Codes'] == df['Account_Codes_v2'],'MatchOnCodes',
np.where((df['Cur'] == 'USD') & (df['branch'] == 31) & (df['Account_Codes'].isin(['D12','D55'])) & (df['Account_Codes_v2'].isin(codes)),'MatchOnCodes','NoMatchOnCodes'))
這輸出:
branch Account Cur ... Account_Spread_v2 Account_Codes_v2 Match
0 31 1234567890 USD ... 3.5 D3 MatchOnCodes
1 10 7854567890 USD ... 2.7 TT MatchOnCodes
2 10 7854567899 AUS ... 8.0 D1 MatchOnCodes
3 6 7854567893 USD ... 2.7 H1 NoMatchOnCodes
4 10 7854567893 EUR ... 2.7 JG MatchOnCodes
5 31 9632587415 USD ... 1.4 D2 MatchOnCodes
每個 OP 評論:
codes = ['D1','D2','D3','D4']
def matching_func(row):
if row['Account_Codes'] == row['Account_Codes_v2']:
return 'MatchOnCodes'
elif (row['Cur'] == 'USD') & (row['branch'] == 31) & (row['Account_Codes'] in ['D12','D55']) & (row['Account_Codes_v2'] in codes):
return 'MatchOnCodes'
else:
return 'NoMatchOnCodes'
df['Match'] = df.apply(lambda x: matching_func(x),axis=1)
輸出:
branch Account Cur ... Account_Spread_v2 Account_Codes_v2 Match
0 31 1234567890 USD ... 3.5 D3 MatchOnCodes
1 10 7854567890 USD ... 2.7 TT MatchOnCodes
2 10 7854567899 AUS ... 8.0 D1 MatchOnCodes
3 6 7854567893 USD ... 2.7 H1 NoMatchOnCodes
4 10 7854567893 EUR ... 2.7 JG MatchOnCodes
5 31 9632587415 USD ... 1.4 D2 MatchOnCodes
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/358215.html
上一篇:如何根據條件洗掉一系列行?
