我在熊貓中有以下資料框
df = pd.DataFrame({
"Name": [ "N1", "N2", "N3", "N1", "N1", "N2", "N3", "N2" ],
"Date": [ "31-10-2021", "31-10-2021" , "31-10-2021", "15-10-2021", "14-10-2021", "13-10-2021", "12-10-2021", "11-10-2021" ],
"Feature": [ 4, 5, 6, 3, 1, 6, 3, 3 ]
})
Name Date Feature
0 N1 31-10-2021 4
1 N2 31-10-2021 5
2 N3 31-10-2021 6
3 N1 15-10-2021 3
4 N1 14-10-2021 1
5 N2 13-10-2021 6
6 N3 12-10-2021 3
7 N2 11-10-2021 3
我想根據給定名稱的特征的當前值與上次出現在資料框中的該名稱的特征值之間的差異創建一個新列,否則為零。
所以鑒于上表,它應該是:
Name Date Feature New_column
0 N1 31-10-2021 4 1
1 N2 31-10-2021 5 -1
2 N3 31-10-2021 6 3
3 N1 15-10-2021 3 2
4 N1 14-10-2021 1 0
5 N2 13-10-2021 6 3
6 N3 12-10-2021 3 0
7 N2 11-10-2021 3 0
有沒有一種矢量化/有效的方法來做到這一點?提前致謝。
uj5u.com熱心網友回復:
我們可以做的
result_df = df.assign(New_column=df.sort_values('Date', ascending=False)
.groupby('Name')['Feature'].diff().fillna(0))
uj5u.com熱心網友回復:
你可以用shift與groupby
import pandas as pd
import numpy as np
df = pd.DataFrame({
"Name": [ "N1", "N2", "N3", "N1", "N1", "N2", "N3", "N2" ],
"Date": [ "31-10-2021", "31-10-2021" , "31-10-2021", "15-10-2021", "14-10-2021", "13-10-2021", "12-10-2021", "11-10-2021" ],
"Feature": [ 4, 5, 6, 3, 1, 6, 3, 3 ]
})
df.sort_values(by = ['Name', 'Date'], inplace = True)
df['New_column'] = df['Feature'] - df.groupby('Name')['Feature'].shift()
df['New_column'] = df['New_column'].replace(np.nan, 0, regex = True)
代碼的最后一行是因為 Name 的第一行將有一個 NaN,但在您的示例中,您顯示您想要一個 0。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/343389.html
