我有一個如下所示的資料框
ID,Region,Supplier,year,output
1,ANZ,AB,2021,1
2,ANZ,ABC,2022,1
3,ANZ,ABC,2022,1
4,ASEAN,ABQ,2021,1
5,ASEAN,ABE,2021,2
6,ASEAN,ABQ,2021,3
7,UK,ABW,2021,8
8,UK,ABO,2020,1
9,UK,ABR,2019,0
我想做以下
a)根據Region = UKand(Supplier = ABW或輸出> = 1或年份= 2021)過濾資料框
b)根據Region = ANZand(Supplier = ABC或輸出> 1或年份= 2021)過濾資料框
c)根據Region = ASEANand(Supplier = ABE或輸出> 1或年份= 2021)過濾資料框
所以,我嘗試了以下
df_ANZ = df[(df['Region']=='ANZ') & ((df['Supplier']=='ABC') | (df['output']>1) | (df['year']==2021))]
df_UK = df[(df['Region']=='UK') & ((df['Supplier']=='ABW') | (df['output']>=1) | (df['year']==2021))]
df_ASEAN = df[(df['Region']=='ASEAN') & ((df['Supplier']=='ABE') | (df['output']>1) | (df['year']==2021))]
df_ANZ.append(df_UK).append(df_ASEAN)
但問題是,我對大約 10 個地區有類似的標準。為每個區域寫 10 行可能并不優雅。
對于擁有 500 萬行的大資料框,是否有任何高效且優雅的方法來執行此操作?
我希望我的輸出如下
ID,Region,Supplier,year,output
1,ANZ,AB,2021,1
2,ANZ,ABC,2022,1
3,ANZ,ABC,2022,1
4,ASEAN,ABQ,2021,1
5,ASEAN,ABE,2021,2
6,ASEAN,ABQ,2021,3
7,UK,ABW,2021,8
uj5u.com熱心網友回復:
Region為with創建元組Supplier,因此可能首先在串列理解中過濾,然后通過ORin加入掩碼np.logical_or.reduce:
tups = [('ANZ','ABC'),('UK','ABW'),('ASEAN','ABE')]
m = [(df['Region']==a) & ((df['Supplier']==b) | (df['output']>1) | (df['year']==2021))
for a, b in tups]
df = df[np.logical_or.reduce(m)]
print (df)
ID Region Supplier year output
0 1 ANZ AB 2021 1
1 2 ANZ ABC 2022 1
2 3 ANZ ABC 2022 1
3 4 ASEAN ABQ 2021 1
4 5 ASEAN ABE 2021 2
5 6 ASEAN ABQ 2021 3
6 7 UK ABW 2021 8
uj5u.com熱心網友回復:
另一種方法使用query(您也可以直接創建元組串列而不是使用zip):
regions = ['ANZ', 'UK', 'ASEAN']
suppliers = ['ABC', 'ABW', 'ABE']
min_outputs = [1, 1, 1]
years = [2021, 2021, 2021]
query = '|'.join([f'(Region=="{reg}" & Supplier=="{sup}" | output > {o} | year=={y})'
for reg, sup, o, y in zip(regions, suppliers, min_outputs, years)])
df.query(query)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/488338.html
