所以我有這個資料框:
data = {'value':[1,1,1,0,1,0,1,0,0,0,0,0,1,1,1,0,0,1,0,1]}
df = pd.DataFrame(data)
| 排 | 價值 |
|---|---|
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
| 4 | 1 |
| 5 | 0 |
| 6 | 1 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| 10 | 0 |
| 11 | 0 |
我想添加另一列名為“累積”的列,它將計算一個數字連續出現的次數,并且當該值不再與以前的值相同時將停止計數。然后它應該重新開始計數。這將是結果資料框:
| 排 | 價值 | 累積 |
|---|---|---|
| 0 | 1 | 0 |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 0 | 0 |
| 4 | 1 | 0 |
| 5 | 0 | 0 |
| 6 | 1 | 0 |
| 7 | 0 | 0 |
| 8 | 0 | 1 |
| 9 | 0 | 2 |
| 10 | 0 | 3 |
| 11 | 0 | 4 |
| 12 | 1 | 0 |
我已經嘗試了幾個內置函式,例如where,mask和cumsum,但是在迭代和創建 for 回圈時我真的一無所知,我很肯定這可能就是答案所在。有沒有我不知道的功能可以做到這一點?還是沒有避免 for 回圈?
uj5u.com熱心網友回復:
嘗試:
df.groupby(df['value'].diff().ne(0).cumsum()).cumcount()
輸出:
0 0
1 1
2 2
3 0
4 0
5 0
6 0
7 0
8 1
9 2
10 3
11 4
12 0
13 1
14 2
15 0
16 1
17 0
18 0
19 0
dtype: int64
uj5u.com熱心網友回復:
此代碼添加列“累積”并計算連續的數量
# add col named "Cumulative"
df['Cumulative'] = [0 for i in range(len(df))]
last = 0
# if value is 1, add 1 to the value in the col named "Cumulative"
for i in range(len(df)):
if df['value'][i] == 1:
df['Cumulative'][i] = last 1
last = 1
else:
df['Cumulative'][i] = df['Cumulative'][i]
last = 0
print(df)
輸出是這樣的:
value Cumulative
0 1 1
1 1 2
2 1 3
3 0 0
4 1 1
5 0 0
6 1 1
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 1 1
13 1 2
14 1 3
15 0 0
16 0 0
17 1 1
18 0 0
19 1 1
編輯
此代碼現在適用于所有不同的數字。
import pandas as pd
data = {'value': [1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1]}
df = pd.DataFrame(data)
# add col named "Cumulative"
df['Cumulative'] = [0 for i in range(len(df))]
last_count = 0
# if value is 1, add 1 to the value in the col named "Cumulative"
last_num = df['value'][0]
for i in range(len(df)):
if df['value'][i] == last_num:
df['Cumulative'][i] = last_count 1
last_count = 1
else:
df['Cumulative'][i] = 1
last_num = df['value'][i]
last_count = 1
print(df)
結果:
value Cumulative
0 1 1
1 1 2
2 1 3
3 0 1
4 1 1
5 0 1
6 1 1
7 0 1
8 0 2
9 0 3
10 0 4
11 0 5
12 1 1
13 1 2
14 1 3
15 0 1
16 0 2
17 1 1
18 0 1
19 1 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/420938.html
標籤:
