不確定我的流程中有什么問題:
這是樣本df:
df = pd.DataFrame({'customer':['A','B','C','D','E','F'],
'Traveled':[1,1,1,0,1,0],
'Travel_count':[2,3,5,0,1,0],
'country1':['UK','Italy','CA', '0','UK','0'],
'country2':['JP','IN','CO','0','EG','0'],
'shopping':['High','High','High','High','Medium','Medium']
})
給出:
customer Traveled Travel_count country1 country2 shopping
0 A 1 2 UK JP High
1 B 1 3 Italy IN High
2 C 1 5 CA CO High
3 D 0 0 0 0 High
4 E 1 1 UK EG Medium
5 F 0 0 0 0 Medium
我想創建一些自動過濾的函式,然后創建一個定制的 df,所以這里有兩個函式可以檢查列上的客戶: Traveled ==
1 和shopping == High
:
def travel():
if (df['Traveled'] == 1):
return True
else:
return False
def shop_high():
if (df['shopping'] == 'High'):
return True
else:
return False
這是一個嵌套的 ifs 代碼,如果上述條件為真,它將檢查那些旅行多于或少于 3 次的人:
def select(df):
if(travel and shop_high):
if (df['Travel_count'] > 3):
return (df['customer'], df['shopping'], ('Customer {} traveled more than 3 times').format(df['customer']))
elif (df['Travel_count'] < 3):
return (df['customer'], df['shopping'], ('Customer {} traveled less than 3 times').format(df['customer']))
如果我將此功能應用于原始 df 以自動過濾和檢查旅行計數,則會得到錯誤的結果:
pd.DataFrame(list(df.apply(select, axis = 1).dropna()))
結果:
0 1 2
0 A High Customer A traveled less than 3 times
1 C High Customer C traveled more than 3 times
2 D High Customer D traveled less than 3 times
3 E Medium Customer E traveled less than 3 times
4 F Medium Customer F traveled less than 3 times
應該:
0 1 2
0 A High Customer A traveled less than 3 times
1 C High Customer C traveled more than 3 times
uj5u.com熱心網友回復:
我會使用布爾索引和numpy.sign
:
import numpy as np
travel = (np.sign(df['Travel_count'].sub(3))
.map({1: ' traveled more than 3 times',
-1: ' traveled less than 3 times'})
)
m1 = df['Traveled'].eq(1)
m2 = df['shopping'].eq('High')
m3 = travel.notna()
out = (df.loc[m1&m2&m3, ['customer', 'shopping']]
.assign(new='Customer ' df['customer'] travel)
)
輸出:
customer shopping new
0 A High Customer A traveled less than 3 times
2 C High Customer C traveled more than 3 times
uj5u.com熱心網友回復:
使用isin
:
new_df = ( df[df[['Traveled', 'shopping']].isin(['High', 1]).all(axis=1)
& df['Travel_count'].ne(3)].reset_index(drop=True))
new_df['new'] = ('Customer ' new_df['customer'] ' traveled '
pd.Series(np.where(new_df['Travel_count'].lt(3), 'less', 'more'))
' than 3 times')
uj5u.com熱心網友回復:
您可以按 3 個條件過濾資料框,并為列印應用一個簡單的函式
des = lambda row: f'Customer {row["customer"]} traveled {"more" if row["Travel_count"] > 3 else "less"} than 3 times'
df = df.loc[(df['Traveled'] == 1) & (df['shopping'] == 'High') & (df['Travel_count'] != 3)]
df['description'] = df.apply(lambda row: des(row), axis=1)
df = df[['customer', 'shopping', 'description']]
輸出
customer shopping description
0 A High Customer A traveled less than 3 times
2 C High Customer C traveled more than 3 times
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/529400.html
上一篇:將日期填入列