如何在python中使用“包含”條件在串列列上加入資料框-有解無憂

我有兩個看起來像這樣的資料框：

資料框 1：

antecedents consequents
0   (20679) (15056BL)
1   (20675) (20676)
2   (20675) (20677)
3   (20723) (20724)
4   (22356) (20724)
... ... ...
178 (22355, 20724, 22356)   (20719)
179 (20724, 22356, 20719)   (22355)
180 (21212, 84991, 84992)   (21977)
181 (21212, 21977, 84992)   (84991)
182 (84991, 21977, 84992)   (21212)

資料框 2：

Invoice Customer ID StockCode
0   489434  13085.0 [85048, 79323P, 79323W, 22041, 21232, 22064, 2...
1   489435  13085.0 [22350, 22349, 22195, 22353]
2   489436  13078.0 [48173C, 21755, 21754, 84879, 22119, 22142, 22...
3   489437  15362.0 [22143, 22145, 22130, 21364, 21360, 21351, 213...
4   489438  18102.0 [21329, 21252, 21100, 21033, 20711, 21410, 214...

資料幀 1 示例：

pd.DataFrame( {'antecedents' : [('20679'), ('85048'), ('22143'), ('22065','22138'), ('20754','21035','22041')],
               'consequents' : [('20676'), ('20719'), ('22355'), ('20724'), ('212212')]
})

資料幀 2 示例：

pd.DataFrame( {'Customer ID' : [13085, 13078, 15362, 18102, 12682, 18087, 18087, 13635, 14110],
               'StockCode' : [
                           ['85048', '79323P', '79323W', '22041', '21232', '22064'], ['22350', '22349', '22195', '22353'], ['48173C', '21755', '21754', '84879', '22119', '22142'],
                             ['22143', '22145', '22130', '21364', '21360', '21351'], ['21329', '21252', '21100', '21033', '20711', '21410'], ['22065', '22138', '22139', '22352', '85014A', '85014B'],
                              ['22321', '22138', '84029E', '22111'], ['21955', '22111', '22296', '84899E', '22271', '22272'],['20754', '21035', '22041', '82001S', '82580', '85150']
                             ]  
})

我需要在以下條件下加入它們：前件 (df1) 中的所有元素都存在于 StockCode (df2) 中。

因此，如果 customerId 13085 具有產品 29679、20675 和 20723，則應為它們顯示結果 15056BL、20676、20677 和 20724。

將前件轉換為串列串列（ant_list）后，我可以使用 for 回圈獲得結果：

for list2 in ant_list:
    for item, row in df_grouped.iterrows():
        col_name = str(list2)
        if all(elem in row['StockCode'] for elem in list2):
            df_grouped.loc[item,col_name] = 1

要得到這樣的東西：

發票	客戶ID	股票代碼	['20971']	['22064']	['79323P']	['84032A']
489434	13085.0。	[85048，...	南。	1.0。	1.0。	南。

但是，我需要找到一種矢量化方式或任何其他更好的方式，以便我可以將解決方案升級到更大的資料集，但我被卡住了。

uj5u.com熱心網友回復：

嘗試：

df = df1.explode('antecedents').merge(df2.explode('StockCode'), right_on='StockCode', left_on='antecedents', how='left')

df
    antecedents consequents Customer ID StockCode
0   20679       20676       NaN         NaN
1   85048       20719       13085.0     85048
2   22143       22355       18102.0     22143
3   22065       20724       18087.0     22065
4   22138       20724       18087.0     22138
5   22138       20724       18087.0     22138
6   20754       212212      14110.0     20754
7   21035       212212      14110.0     21035
8   22041       212212      13085.0     22041
9   22041       212212      14110.0     22041

a = df.groupby('Customer ID')['StockCode'].apply(list).reset_index()
b= df.groupby('Customer ID')['consequents'].apply(list).reset_index()
c= df.groupby('Customer ID')['antecedents'].apply(list).reset_index()
a.merge(b).merge(c)

    Customer ID StockCode               consequents                 antecedents
0   13085.0     [85048, 22041]          [20719, 212212]             [85048, 22041]
1   14110.0     [20754, 21035, 22041]   [212212, 212212, 212212]    [20754, 21035, 22041]
2   18087.0     [22065, 22138, 22138]   [20724, 20724, 20724]       [22065, 22138, 22138]
3   18102.0     [22143]                 [22355]                     [22143]

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/510803.html

標籤：Python熊猫数据框加入

上一篇：按供應商計算訂單導致SQL錯誤

下一篇：基于類似于SQLInner-Join的外鍵操作MSExcel表