Pandas查詢多個資料框-有解無憂

有兩個資料框。第一個具有合同 ID 號和名稱。第二個有合同 ID 號和交易型別。第一個資料框是

合同編號	名	姓
1	約翰	史密斯
2	搶	棕色的
3	搶	棕色的

第二個 DataFrame 是

合同編號	交易
1	現金
1	現金
1	現金
2	銀行轉帳
2	銀行轉帳
2	銀行轉帳
3	現金

我想計算僅使用一種交易型別的個人數量。在示例中，有兩個人。第一個只使用現金支付，第二個使用銀行轉賬和現金。因此，答案將是 1。

DataFrame 很大，將它們連接在一起是不可行的。還有哪些其他選擇？

資料：

df1：

{'contract id': [1, 2, 3],
 'first name': ['John', 'Rob', 'Rob'],
 'last name': ['Smith', 'Brown', 'Brown']}

df2：

{'contract id': [1, 1, 1, 2, 2, 2, 3],
 'transaction': ['cash', 'cash', 'cash', 'bank transfer',
                 'bank transfer', 'bank transfer', 'cash']}

uj5u.com熱心網友回復：

您可以在名稱中創建單列df1；然后map命名為df2. 如果您在中有很多重復值df2，那么首先可能值得drop_duplicates。然后在“姓名”列上使用value_counts 來統計有單一交易型別的人數eq：sum

mapping = df1.assign(name=df1['first name']   ' '   df1['last name']).set_index('contract id')['name']
df2 = df2.drop_duplicates().copy()
df2['name'] = df2['contract id'].map(mapping)
out = df2['name'].value_counts().eq(1).sum()

另一種選擇是，groupby名稱并構建一個布爾掩碼來過濾名稱（但我懷疑這會比其他方法慢）。

df2['transaction'].groupby(df2['contract id'].map(mapping)).nunique().eq(1).sum()

輸出：

uj5u.com熱心網友回復：

一種合并 groupby的解決方案：

# merge of the 2 datasets based on the common column to get one table with all the information
# for the real dataset may have to be more precise in the type of merging (left, outer, ...)
data = df1.merge(df2) 
data['name'] = data['first name']   data['last name']  # to get "unique" full names 

dfg = (data.groupby('name')['transaction']  # group the data by name and provide the column transaction
           .unique()                        # for which we take the list of unique values for each name
           .apply(lambda x: len(x))         # then we get the number of elements in the lists
)
res = dfg.index[dfg.values == 1].tolist()   # list of the names for which the value is 1

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/433535.html

標籤：Python 熊猫数据框大数据

上一篇：（Python）資料幀到Numpy陣列

下一篇：獲取每行的第二個非空白列的名稱