將Pandas資料框與預定義的限制進行比較-有解無憂

我有一個 Pandas 資料框，其結構如下。它包含先前進行的比較的結果，以及minimum_difference顯示該行的哪一列包含該比較中較小的絕對差異的列。

df_test

V       |  A    | B   |  C      |  D    | minimum_difference
-10     |  nan  | nan |  nan    |  nan  | nan
-9.9    |  10   | 1   |  -2200  |  100  | B
-9.8    |  11   | 2   |  -2211  |  1    | D

另外，對于每個值列（A、B、C、D），我有最小差異的目標最大差異，如下所示：

max_difference = pd.Series(dict(
    A=1,
    B=2,
    C=10,
    D=0.5,
))

我想向df_test添加一個新列，該列將最小差異與該列的目標值進行比較。例如：

V       |  A    | B   |  C      |  D    | minimum_difference | is_within_max_target
-10     |  nan  | nan |  nan    |  nan  | nan                | nan
-9.9    |  10   | 1   |  -2200  |  100  | B                  | TRUE
-9.8    |  11   | 2   |  -2211  |  1    | D                  | FALSE

非常歡迎任何輸入和想法！

uj5u.com熱心網友回復：

這是一個矢量化解決方案（快速）：

# first: the minimum difference

# we use all names defined in the max_difference Series
cols = max_difference.index.tolist()
z = df_test[cols].abs()
df_test['minimum_difference'] = z.idxmin(axis=1)


# second: whether that difference is <= the corresponding max_difference
i = np.argmin(z.values, axis=1)
df_test['is_within_max_target'] = z.values[np.arange(len(i)), i] <= max_difference.values[i]

請注意，為了同質性（dtype=bool對于最后一列），我們不會NaN在該列中結轉。

uj5u.com熱心網友回復：

一種方法是對每一行應用一個函式（非矢量化；對于完整資料集可能不夠快，也可能不夠快）：

def check_diffs(row):
    col = row['minimum_difference']
    if col is np.nan:
        return np.nan
    else:
        return row[col] <= max_difference[col]

df_test['is_within_max_target'] = df_test.apply(check_diffs, axis=1)

print(df_test)
# Output given your example data:
      V     A    B       C      D minimum_difference is_within_max_target
0 -10.0   NaN  NaN     NaN    NaN                NaN                  NaN
1  -9.9  10.0  1.0 -2200.0  100.0                  B                 True
2  -9.8  11.0  2.0 -2211.0    1.0                  D                False

uj5u.com熱心網友回復：

我試圖讓它盡可能簡單：

import pandas as pd 
import math
data = [[-10 , math.nan, math.nan, math.nan, math.nan, math.nan],
        [-9.9, 10 , 1 , -2200, 100, 'B'],
        [-9.8, 11 , 2 , -2211, 1, 'D']]
df_test = pd.DataFrame(data, columns = ['V', 'A', 'B' , 'C', 'D', 'minimum_difference'])


max_difference = {
                    'A':1,
                    'B':2,
                    'C':10,
                    'D':0.5}

df_result= pd.DataFrame(columns = ['V', 'A', 'B' , 'C', 'D', 'minimum_difference', 'is_within_max_target'])


for i in range(0, len(df_test)):
               current_row = df_test.iloc[i].tolist()
               current_diff_column = str(current_row[5])
           
               if(current_diff_column!='nan'):
                   current_value = df_test[current_diff_column].iloc[i]
                   if(current_value <= max_difference[current_diff_column]):
                       current_row.append('True')
                       df_result.loc[len(df_result)] = current_row
                   else:
                       current_row.append('False')
                       df_result.loc[len(df_result)] = current_row
               else:
                   current_row.append(math.nan)
                   df_result.loc[len(df_result)] = current_row
           
               
print(df_result)

輸出：

      V     A    B       C      D minimum_difference is_within_max_target
0 -10.0   NaN  NaN     NaN    NaN                NaN                  NaN
1  -9.9  10.0  1.0 -2200.0  100.0                  B                 True
2  -9.8  11.0  2.0 -2211.0    1.0                  D                False

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/396532.html

標籤：Python 蟒蛇-3.x 熊猫数据框

上一篇：在字串串列中找到最佳匹配

下一篇：當用戶在輸入中添加文本時，在用戶輸入后添加文本的方法是什么？在Python中