Pandas基于跨多列的最小資料出現進行過濾-有解無憂

我有一個這樣的資料框

country     data_fingerprint   organization     
US          111                Tesco         
UK          222                IBM          
US          111                Yahoo           
PY          333                Tesco
US          111                Boeing   
CN          333                TCS  
NE          458                Yahoo
UK          678                Tesco

我想要那些 data_fingerprint 用于具有前 2 個計數的組織和國家/地區存在的位置

因此，如果在組織中看到前 2 名出現在 Tesco、Yahoo和我們有 US、UK 的國家/地區。

因此，基于 data_fingerprint 的輸出應該有

data_fingerprint
111
678

我試圖讓組織存在于我的完整資料框中的是這個

# First find top 2 occurances of organization
nd = df['organization'].value_counts().groupby(level=0, group_keys=False).head(2)
# Then checking if the organization exist in the complete dataframe and filtering those rows
new = df["organization"].isin(nd)

但我在這里沒有得到任何資料。一旦我得到資料，我就可以和國家一起做。有人可以幫我得到輸出。我的資料較少，所以使用 Pandas

uj5u.com熱心網友回復：

這是一種方法

df[
    df['organization'].isin(df['organization'].value_counts().head(2).index) &
    df['country'].isin(df['country'].value_counts().head(2).index)
]['data_fingerprint'].unique()

array([111, 678], dtype=int64)

uj5u.com熱心網友回復：

注釋代碼

# find top 2 most occurring country and organization
i1 = df['country'].value_counts().index[:2]
i2 = df['organization'].value_counts().index[:2]

# Create boolean mask to select the rows having top 2 country and org.
mask = df['country'].isin(i1) & df['organization'].isin(i2)

# filter the rows using the mask and drop dupes in data_fingerprint
df.loc[mask, ['data_fingerprint']].drop_duplicates()

結果

   data_fingerprint
0               111
7               678

uj5u.com熱心網友回復：

你可以做

# First find top 2 occurances of organization
nd = df['organization'].value_counts().head(2).index
# Then checking if the organization exist in the complete dataframe and filtering those rows
new = df["organization"].isin(nd)

輸出——只剩下樂購和雅虎了

df[new]

    country data_fingerprint    organization
0        US              111           Tesco
2        US              111           Yahoo
3        PY              333           Tesco
6        NE              458           Yahoo
7        UK              678           Tesco

你可以為country

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/505754.html

標籤：Python 熊猫数据框

上一篇：根據第二個DF中給出的定義從pd.DataFrame中洗掉行

下一篇：如何遍歷熊貓資料框中的唯一日期，在每次迭代中產生新的資料框？