我正在嘗試在 Pandas 中找到一個矢量化解決方案,這在電子表格中很常見,即在根據實際 cumsum 的結果跳過或設定固定值時進行 cumsum。我有以下幾點:
A
1 0
2 -1
3 2
4 3
5 -2
6 -3
7 1
8 -1
9 1
10 -2
11 1
12 2
13 -1
14 -2
我需要的是添加帶有 'A' cumsum 的第二列,如果這些總和中的一個給出正值,則將其替換為 0 并繼續使用該 0 進行 cumsum。同時,如果 cumsum 給出一個負值低于列 BI 中的 0 之后記錄的 A 列中的最低值,需要將其替換為 A 列中的最低值。我知道這是一個相當大的問題,但是否有針對此的矢量化解決方案?也許使用輔助列。結果應如下所示:
A B
1 0 0
2 -1 -1 # -1 0 = -1
3 2 0 # -1 2 = 1 but 1>0 so this is 0
4 3 0 # same as previous row
5 -2 -2 # -2 0 = -2
6 -3 -3 # -2-3 = -5 but the lowest value in column A since last 0 is -3 so this is replaced by -3
7 1 -2 # 1-3 = -2
8 -1 -3 # -1-2 = -3
9 1 -2 # -3 1 = -2
10 -2 -3 # -2-2 = -4 but the lowest value in column A since last 0 is -3 so this is replaced by -3
11 1 -2 # -3 1 = -2
12 2 0 # -2 2 = 0
13 -1 -1 # 0-1 = -1
14 -2 -2 # -1-2 = -3 but the lowest value in column A since last cap is -2 so this is -2 instead of -3
目前我做了這個,但不能 100% 作業,而且效率不高:
df['B'] = 0
df['B'][0] = 0
for x in range(len(df)-1):
A = df['A'][x 1]
B = df['B'][x] A
if B >= 0:
df['B'][x 1] = 0
elif B < 0 and A < 0 and B < A:
df['B'][x 1] = A
else:
df['B'][x 1] = B
uj5u.com熱心網友回復:
使用df['A'].expanding(1).apply(function)我可以運行自己的function,它首先只得到一行、接下來的 2 行、接下來的 3 行等。我沒有給出先前計算的結果,它需要一次又一次地進行所有計算,但它不需要global
變數和硬編碼df['A']
檔案:Series.expanding
A = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]
import pandas as pd
df = pd.DataFrame({"A": A})
def function(values):
#print(values)
#print(type(valuse)
#print(len(values))
result = 0
last_zero = 0
for index, value in enumerate(values):
result = value
if result >= 0:
result = 0
last_zero = index
else:
minimal = min(values[last_zero:])
#print(index, last_zero, minimal)
#if result < minimal:
# result = minimal
result = max(result, minimal)
#print('result:', result)
return result
df['B'] = df['A'].expanding(1).apply(function)
df['B'] = df['B'].astype(int)
print(df)
結果:
A B
0 0 0
1 -1 -1
2 2 0
3 3 0
4 -2 -2
5 -3 -3
6 1 -2
7 -1 -3
8 1 -2
9 -2 -3
10 1 -2
11 2 0
12 -1 -1
13 -2 -2
相同但正常apply()- 它需要global變數和硬編碼df['A']
A = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]
import pandas as pd
df = pd.DataFrame({"A": A})
result = 0
last_zero = 0
index = 0
def function(value):
global result
global last_zero
global index
result = value
if result >= 0:
result = 0
last_zero = index
else:
minimal = min(df['A'][last_zero:])
#print(index, last_zero, minimal)
#if result < minimal:
# result = minimal
result = max(result, minimal)
index = 1
#print('result:', result)
return result
df['B'] = df['A'].apply(function)
df['B'] = df['B'].astype(int)
print(df)
使用普通for-loop相同
A = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]
import pandas as pd
df = pd.DataFrame({"A": A})
all_values = []
result = 0
last_zero = 0
for index, value in df['A'].iteritems():
result = value
if result >= 0:
result = 0
last_zero = index
else:
minimal = min(df['A'][last_zero:])
#print(index, last_zero, minimal)
#if result < minimal:
# result = minimal
result = max(result, minimal)
all_values.append(result)
df['B'] = all_values
print(df)
uj5u.com熱心網友回復:
我希望能幫助你與我的代碼,因為我無法找到一種方法,在功能上添加使用條件以修改所產生的值的方法cumsum()中的pandas.DataFrame()實體。
# pandas
import pandas as pd
df = pd.DataFrame()
# pre-defined A column
a = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]
df["A"] = a
# create a new column (array) based on A or df length
b_col = [0 for i in range(len(df))]
# minimal value and last index cap minimal value (or zero 0)
last_val = min(df["A"])
idx = 0
# define first B row
b_col[0] = max(min_val, min(df["A"][0] (b_col[0]), 0))
在回圈的核心部分,它負責分配最后一個結果為0 的索引,然后將其作為最小值,然后在前一個分配的和數字0之間選擇值
for i in range(0, len(df)-1):
if (df["A"][i 1]>=0):
idx = i
if (df["A"][i 1] (b_col[i]) < last_val):
b_col[i 1] = last_val
last_val = min(df["A"][idx:])
else:
b_col[i 1] = min(df["A"][i 1] (b_col[i]), 0)
df["B"] = b_col
輸出:
>>> df
A B
0 0 0
1 -1 -1
2 2 0
3 3 0
4 -2 -2
5 -3 -3
6 1 -2
7 -1 -3
8 1 -2
9 -2 -3
10 1 -2
11 2 0
12 -1 -1
13 -2 -2
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/352946.html
上一篇:定期切片2Dnumpy陣列
