我有一個 Pandas 資料框,其結構如下。它包含先前進行的比較的結果,以及minimum_difference顯示該行的哪一列包含該比較中較小的絕對差異的列。
df_test
V | A | B | C | D | minimum_difference
-10 | nan | nan | nan | nan | nan
-9.9 | 10 | 1 | -2200 | 100 | B
-9.8 | 11 | 2 | -2211 | 1 | D
另外,對于每個值列(A、B、C、D),我有最小差異的目標最大差異,如下所示:
max_difference = pd.Series(dict(
A=1,
B=2,
C=10,
D=0.5,
))
我想向df_test添加一個新列,該列將最小差異與該列的目標值進行比較。例如:
V | A | B | C | D | minimum_difference | is_within_max_target
-10 | nan | nan | nan | nan | nan | nan
-9.9 | 10 | 1 | -2200 | 100 | B | TRUE
-9.8 | 11 | 2 | -2211 | 1 | D | FALSE
非常歡迎任何輸入和想法!
uj5u.com熱心網友回復:
這是一個矢量化解決方案(快速):
# first: the minimum difference
# we use all names defined in the max_difference Series
cols = max_difference.index.tolist()
z = df_test[cols].abs()
df_test['minimum_difference'] = z.idxmin(axis=1)
# second: whether that difference is <= the corresponding max_difference
i = np.argmin(z.values, axis=1)
df_test['is_within_max_target'] = z.values[np.arange(len(i)), i] <= max_difference.values[i]
請注意,為了同質性(dtype=bool對于最后一列),我們不會NaN在該列中結轉。
uj5u.com熱心網友回復:
一種方法是對每一行應用一個函式(非矢量化;對于完整資料集可能不夠快,也可能不夠快):
def check_diffs(row):
col = row['minimum_difference']
if col is np.nan:
return np.nan
else:
return row[col] <= max_difference[col]
df_test['is_within_max_target'] = df_test.apply(check_diffs, axis=1)
print(df_test)
# Output given your example data:
V A B C D minimum_difference is_within_max_target
0 -10.0 NaN NaN NaN NaN NaN NaN
1 -9.9 10.0 1.0 -2200.0 100.0 B True
2 -9.8 11.0 2.0 -2211.0 1.0 D False
uj5u.com熱心網友回復:
我試圖讓它盡可能簡單:
import pandas as pd
import math
data = [[-10 , math.nan, math.nan, math.nan, math.nan, math.nan],
[-9.9, 10 , 1 , -2200, 100, 'B'],
[-9.8, 11 , 2 , -2211, 1, 'D']]
df_test = pd.DataFrame(data, columns = ['V', 'A', 'B' , 'C', 'D', 'minimum_difference'])
max_difference = {
'A':1,
'B':2,
'C':10,
'D':0.5}
df_result= pd.DataFrame(columns = ['V', 'A', 'B' , 'C', 'D', 'minimum_difference', 'is_within_max_target'])
for i in range(0, len(df_test)):
current_row = df_test.iloc[i].tolist()
current_diff_column = str(current_row[5])
if(current_diff_column!='nan'):
current_value = df_test[current_diff_column].iloc[i]
if(current_value <= max_difference[current_diff_column]):
current_row.append('True')
df_result.loc[len(df_result)] = current_row
else:
current_row.append('False')
df_result.loc[len(df_result)] = current_row
else:
current_row.append(math.nan)
df_result.loc[len(df_result)] = current_row
print(df_result)
輸出:
V A B C D minimum_difference is_within_max_target
0 -10.0 NaN NaN NaN NaN NaN NaN
1 -9.9 10.0 1.0 -2200.0 100.0 B True
2 -9.8 11.0 2.0 -2211.0 1.0 D False
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/396532.html
上一篇:在字串串列中找到最佳匹配
