我有一組我想做的資料,如果最新的st_1或st_2大于早期的st_1或 st_2則分別將 True 或 False 放在另一列中。我怎樣才能根據date和id做到這一點?
id date st_1 st_2
1 2022-02-28 00:00:00 00:00 60.0 6.0
2 2021-10-31 00:00:00 00:00 70.0 0.0
2 2021-12-31 00:00:00 00:00 70.0 4.0
3 2021-10-31 00:00:00 00:00 60.0 0.0
4 2021-06-30 00:00:00 00:00 63.3 2.66
4 2021-08-31 00:00:00 00:00 60.0 3.0
4 2022-02-28 00:00:00 00:00 70.0 2.0
5 2021-06-30 00:00:00 00:00 70.0 3.0
4 2022-02-28 00:00:00 00:00 70.0 2.0
5 2021-06-30 00:00:00 00:00 70.0 3.0
5 2021-08-31 00:00:00 00:00 80.0 2.0
5 2021-10-31 00:00:00 00:00 70.0 3.5
我的預期結果:
id date st_1 st_2 outcome
1 2022-02-28 00:00:00 00:00 60.0 6.0 false
2 2021-10-31 00:00:00 00:00 70.0 0.0 false
2 2021-12-31 00:00:00 00:00 70.0 4.0 true
3 2021-10-31 00:00:00 00:00 60.0 0.0 false
4 2021-06-30 00:00:00 00:00 63.3 2.66 false
4 2021-08-31 00:00:00 00:00 60.0 3.0 true
4 2022-02-28 00:00:00 00:00 70.0 2.0 true
5 2021-06-30 00:00:00 00:00 70.0 3.0 false
5 2021-08-31 00:00:00 00:00 80.0 2.0 true
5 2021-10-31 00:00:00 00:00 70.0 3.5 true
uj5u.com熱心網友回復:
IIUC,您想使用日期檢查條件,但沒有針對前一個日期測驗一個日期。C Pappy 的答案中的邏輯比這更好,但這只在日期組內進行檢查,因此它導致更少的“真”。請讓我們知道哪個是正確的。
df.sort_values(['date', 'id'], inplace=True)
df['st_1_check'] = False
df['st_2_check'] = False
def test_conditions(x):
if x.shape[0] > 1:
x.loc[:, 'st_1_check'] = x['st_1'] - x['st_1'].shift(1)
x.loc[:, 'st_2_check'] = x['st_2'] - x['st_2'].shift(1)
return x
dfnew = df.groupby(['date']).apply(test_conditions)
dfnew.fillna(False, inplace=True)
dfnew['st_1_check'] = np.where(dfnew['st_1_check'] > 0, True, dfnew['st_1_check'])
dfnew['st_2_check'] = np.where(dfnew['st_2_check'] > 0, True, dfnew['st_2_check'])
dfnew['st_1_check'] = np.where(dfnew['st_1_check'] <= 0, False, dfnew['st_1_check'])
dfnew['st_2_check'] = np.where(dfnew['st_2_check'] <= 0, False, dfnew['st_2_check'])
dfnew
id date st_1 st_2 st_1_check st_2_check
4 4 2021-06-30 00:00:00 00:00 63.30000 2.66000 False False
7 5 2021-06-30 00:00:00 00:00 70.00000 3.00000 True True
9 5 2021-06-30 00:00:00 00:00 70.00000 3.00000 False False
5 4 2021-08-31 00:00:00 00:00 60.00000 3.00000 False False
10 5 2021-08-31 00:00:00 00:00 80.00000 2.00000 True False
1 2 2021-10-31 00:00:00 00:00 70.00000 0.00000 False False
3 3 2021-10-31 00:00:00 00:00 60.00000 0.00000 False False
11 5 2021-10-31 00:00:00 00:00 70.00000 3.50000 True True
2 2 2021-12-31 00:00:00 00:00 70.00000 4.00000 False False
0 1 2022-02-28 00:00:00 00:00 60.00000 6.00000 False False
6 4 2022-02-28 00:00:00 00:00 70.00000 2.00000 True False
8 4 2022-02-28 00:00:00 00:00 70.00000 2.00000 False False
uj5u.com熱心網友回復:
嘗試這個:
def func_date(x):
if x==0:
return 0
elif df.at[x-1,'st_1']>df.at[x,'st_1']:
return 'higher'
elif df.at[x-1,'st_1']==df.at[x,'st_1']:
return '='
else:
return 'less'
df['result']=df.index.map(func_date)
print(df)
uj5u.com熱心網友回復:
更新 #2:我將排序固定為先按 id 排序,然后按日期排序,并添加列 lag_id 現在用于確保僅在同一 id 內進行比較
更新:我剛剛注意到規范是“如果最新的 st_1 或 st_2 大于早期的 st_1 或 st_2”,這意味著正確的答案是使用“|” 而不是原始答案的“&”。已更正。
代碼:
import io
import pandas as pd
string = """id date st_1 st_2
1 "2022-02-28 00:00:00 00:00" 60.0 6.0
2 "2021-10-31 00:00:00 00:00" 70.0 0.0
2 "2021-12-31 00:00:00 00:00" 70.0 4.0
3 "2021-10-31 00:00:00 00:00" 60.0 0.0
4 "2021-06-30 00:00:00 00:00" 63.3 2.66
4 "2021-08-31 00:00:00 00:00" 60.0 3.0
4 "2022-02-28 00:00:00 00:00" 70.0 2.0
5 "2021-06-30 00:00:00 00:00" 70.0 3.0
4 "2022-02-28 00:00:00 00:00" 70.0 2.0
5 "2021-06-30 00:00:00 00:00" 70.0 3.0
5 "2021-08-31 00:00:00 00:00" 80.0 2.0
5 "2021-10-31 00:00:00 00:00" 70.0 3.5
"""
data = io.StringIO(string)
df = pd.read_csv(data, sep="\s ") # Load df0 from the data string
df.sort_values(['id', 'date'], inplace=True) # Sort according to the spec
print(df)
df['lag_id'] = df['id'].shift(1) # Lag the id column
df['lag_st_1'] = df['st_1'].shift(1) # Create column lag_st_1 with the st_1 data lagged by 1 row
df['lag_st_2'] = df['st_2'].shift(1) # Ditto for st_2
print(df)
# Create result column with True values where the right conditions are met
df.loc[(df['id'] == df['lag_id'])
& (
(df['st_1'] > df['lag_st_1'])
| (df['st_2'] > df['lag_st_2'])
), 'result'] = True
# The previous operation fills the rest of the rows with NAs.
# Here we change the NAs to "False"
df['result'] = df['result'].fillna(False)
print(df)
更新的輸出:
id date st_1 st_2
0 1 2022-02-28 00:00:00 00:00 60.0 6.00
1 2 2021-10-31 00:00:00 00:00 70.0 0.00
2 2 2021-12-31 00:00:00 00:00 70.0 4.00
3 3 2021-10-31 00:00:00 00:00 60.0 0.00
4 4 2021-06-30 00:00:00 00:00 63.3 2.66
5 4 2021-08-31 00:00:00 00:00 60.0 3.00
6 4 2022-02-28 00:00:00 00:00 70.0 2.00
8 4 2022-02-28 00:00:00 00:00 70.0 2.00
7 5 2021-06-30 00:00:00 00:00 70.0 3.00
9 5 2021-06-30 00:00:00 00:00 70.0 3.00
10 5 2021-08-31 00:00:00 00:00 80.0 2.00
11 5 2021-10-31 00:00:00 00:00 70.0 3.50
id date st_1 st_2 lag_id lag_st_1 lag_st_2
0 1 2022-02-28 00:00:00 00:00 60.0 6.00 NaN NaN NaN
1 2 2021-10-31 00:00:00 00:00 70.0 0.00 1.0 60.0 6.00
2 2 2021-12-31 00:00:00 00:00 70.0 4.00 2.0 70.0 0.00
3 3 2021-10-31 00:00:00 00:00 60.0 0.00 2.0 70.0 4.00
4 4 2021-06-30 00:00:00 00:00 63.3 2.66 3.0 60.0 0.00
5 4 2021-08-31 00:00:00 00:00 60.0 3.00 4.0 63.3 2.66
6 4 2022-02-28 00:00:00 00:00 70.0 2.00 4.0 60.0 3.00
8 4 2022-02-28 00:00:00 00:00 70.0 2.00 4.0 70.0 2.00
7 5 2021-06-30 00:00:00 00:00 70.0 3.00 4.0 70.0 2.00
9 5 2021-06-30 00:00:00 00:00 70.0 3.00 5.0 70.0 3.00
10 5 2021-08-31 00:00:00 00:00 80.0 2.00 5.0 70.0 3.00
11 5 2021-10-31 00:00:00 00:00 70.0 3.50 5.0 80.0 2.00
id date st_1 st_2 lag_id lag_st_1 lag_st_2 result
0 1 2022-02-28 00:00:00 00:00 60.0 6.00 NaN NaN NaN False
1 2 2021-10-31 00:00:00 00:00 70.0 0.00 1.0 60.0 6.00 False
2 2 2021-12-31 00:00:00 00:00 70.0 4.00 2.0 70.0 0.00 True
3 3 2021-10-31 00:00:00 00:00 60.0 0.00 2.0 70.0 4.00 False
4 4 2021-06-30 00:00:00 00:00 63.3 2.66 3.0 60.0 0.00 False
5 4 2021-08-31 00:00:00 00:00 60.0 3.00 4.0 63.3 2.66 True
6 4 2022-02-28 00:00:00 00:00 70.0 2.00 4.0 60.0 3.00 True
8 4 2022-02-28 00:00:00 00:00 70.0 2.00 4.0 70.0 2.00 False
7 5 2021-06-30 00:00:00 00:00 70.0 3.00 4.0 70.0 2.00 False
9 5 2021-06-30 00:00:00 00:00 70.0 3.00 5.0 70.0 3.00 False
10 5 2021-08-31 00:00:00 00:00 80.0 2.00 5.0 70.0 3.00 True
11 5 2021-10-31 00:00:00 00:00 70.0 3.50 5.0 80.0 2.00 True
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/429603.html
