根據第二個DF中給出的定義從pd.DataFrame中洗掉行-有解無憂

我有兩個資料框x和y. DFx包含兩個分組變數S和A，以及一個值變數V。我想從這個 DF 中洗掉定義在y. DFy包含變數S和變數 A_D，它們共同定義需要從中洗掉哪個 (S,A) 對x。

但是， , 中的每個元素y['A_D']（可以）包含來自的元素字串A，用逗號分隔。這些單獨的元素中的每一個都應該被洗掉x（對于中的特定元素S）。此外，如果y['A_D']包含其中特定元素的所有S元素，則應洗掉中的整個 S 組x。

我找到了一個我想要的解決方案，但我的問題是，有沒有更簡單或更 Pythonic 的方法來解決這個問題？

import pandas as pd
import numpy as np

# Define x
x = pd.DataFrame({'S': np.repeat(['s1','s2','s3'], 5),
                  'A': [j for i in range(3) for j in ['a','b','c','d','e']],
                  'V': np.random.uniform(size=15) })

# Define y. Which (S,A) pairs should be deleted from x. For 's1' all rows should be deleted.
# For 's2' 'a' and 'd' rows should be deleted and for 's3' the 'c' row should be deleted.
y = pd.DataFrame({'S':['s1','s2','s3'],
                  'A_D':['all','a, d', 'c']})

# My solution:
# expand y to a new DF z. Comma separated elements in 'A_D' become separate elements. Also strip whitespace.
z = []
for i, r in y.iterrows():
    z.append(pd.DataFrame({'S'  : r[0],
                           'A_D': [u.strip() for u in str(r[1]).split(',')]}))
z = pd.concat(z)

# first delete S-groups defined by `all`
x_d = x.merge(z[z['A_D']=='all'],how='left')
x_d = x_d[x_d['A_D']!='all'].drop(columns= 'A_D')

# then drop (S,A) pairs.
x_d = x_d.merge(z[z['A_D']!='all'],how='left', left_on = ['S','A'], right_on = ['S', 'A_D'])
x_d = x_d[pd.isna(x_d['A_D'])].drop(columns= 'A_D').reset_index(drop=True)

# The required result:
print(x_d)

為清楚起見，物件如下所示：

x
Out[1]: 
     S  A         V
0   s1  a  0.758516
1   s1  b  0.522200
2   s1  c  0.190511
3   s1  d  0.544617
4   s1  e  0.480378
5   s2  a  0.191016
6   s2  b  0.714625
7   s2  c  0.852788
8   s2  d  0.142410
9   s2  e  0.909382
10  s3  a  0.895031
11  s3  b  0.153444
12  s3  c  0.751675
13  s3  d  0.227501
14  s3  e  0.586467

y
Out[2]: 
    S   A_D
0  s1   all
1  s2  a, d
2  s3     c

z
Out[3]: 
    S  A_D
0  s1  all
0  s2    a
1  s2    d
0  s3    c

x_d
Out[4]: 
    S  A         V
0  s2  b  0.714625
1  s2  c  0.852788
2  s2  e  0.909382
3  s3  a  0.895031
4  s3  b  0.153444
5  s3  d  0.227501
6  s3  e  0.586467

uj5u.com熱心網友回復：

x
###
     S  A         V
0   s1  a  0.490194
1   s1  b  0.875381
2   s1  c  0.384808
3   s1  d  0.063960
4   s1  e  0.003159
5   s2  a  0.188624
6   s2  b  0.400527
7   s2  c  0.137458
8   s2  d  0.162291
9   s2  e  0.337899
10  s3  a  0.101296
11  s3  b  0.464031
12  s3  c  0.407629
13  s3  d  0.222498
14  s3  e  0.802472

無論用', ', ' ,',分隔','

y
###
    S      A_D
0  s1      all
1  s2  a, d ,c
2  s3      c,d

y['A_D'] = y['A_D'].replace('all', ', '.join(x['A'].unique()))
y = y.assign(A_D=y['A_D'].str.split(',')).explode('A_D')
y['A_D'] = y['A_D'].str.strip()
output = x[~x.set_index(['S','A']).index.isin(y.set_index(['S','A_D']).index)].reset_index(drop=True)
output
###
    S  A         V
0  s2  b  0.400527
1  s2  e  0.337899
2  s3  a  0.101296
3  s3  b  0.464031
4  s3  e  0.802472

uj5u.com熱心網友回復：

這是我的解決方案，它至少更短:)

def filter_group(group, filter_rule):
    return (None if filter_rule == 'all'
            else group[~group["A"].isin(filter_rule.replace(' ', '').split(','))])

x_d = pd.concat(filter_group(x.groupby('S').get_group(grp), filter_rule) 
                for grp, filter_rule in dict(zip(y["S"], y["A_D"])).items())

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/505753.html

標籤：Python 熊猫数据框

上一篇：如何將函式應用于串列中資料框中的每個元素并在R中回傳資料框？

下一篇：Pandas基于跨多列的最小資料出現進行過濾