手動ROC曲線與sklearn.metrics不匹配-有解無憂

import numpy as np
from sklearn.metrics import roc_curve
from sklearn.preprocessing import binarize
import matplotlib.pyplot as plt 
from sklearn.metrics import confusion_matrix
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from sklearn.metrics import accuracy_score, recall_score, precision_score

資料

y_pred = np.array([0.4, 0.2, 0.3, 0.6, 0.1, 0.3, 0.7, 0.2, 0.3, 0.8, 0.3, 0.9, 0.3, 0.2, 0.2, 
                   0.4, 0.9, 0.4, 0.3, 0.6, 0.7, 0.2, 0.8, 0.2, 0.6, 0.1, 0.1])

y_test =np.array(["No","No","No","Yes","No","No","Yes","No","No","Yes","No","Yes",
                  "No","No","No", "No","Yes","No","No","No","No","Yes",
                  "No","Yes","No","No","No"])

主程式

我調整閾值并將召回和 fpr 保存在串列中以繪制它。此外，我將 sklearn 指標回傳的值保存在另一個串列中，以確保獲得正確的值。

def recall_fpr(confusion_matrix):
    """Given a confusion matrix will return the recall and teh false positive reate"""
    cm = confusion_matrix
    Recall = round(cm[0, 0] / (cm[0, 0]   cm[0, 1]), 3) # TP /(TP   FN)
    Precision = round(cm[0, 0] / (cm[0, 0]   cm[1, 0]), 3) #TP /(TP   FP)
    False_Positive_rate = round((1 - Precision), 3)

    return Recall, False_Positive_rate

list_recall = []
list_fpr = []
list_recall_sk = []
list_fpr_sk = []
for i in range (1, 10):
    y_pred = y_pred.reshape(-1, 1)
    y_pred2 = binarize(y_pred, i/10)
    y_pred2 = np.where(y_pred2 == 1, 'Yes', 'No')
    cm = confusion_matrix(y_test, y_pred2, labels=["Yes", "No"])

    Recall, fpr = recall_fpr(cm)
    list_recall.append(Recall)
    list_fpr.append(fpr)
   
   # I just add that to check I m getting right the results
   recall_sk = round(recall_score(y_test, y_pred2, pos_label="Yes"), 3)
   list_recall_sk.append(recall_sk)

   fpr_sk = round(1 - round(precision_score(y_test, y_pred2, pos_label="Yes"), 3),3)
   list_fpr_sk.append(fpr_sk)

繪制值

df_threshold = pd.DataFrame({"Recall":list_recall, "False_Positives_rate": list_fpr})
df_threshold.plot(x='False_Positives_rate', y='Recall', style='o')

手動 ROC 曲線與 sklearn.metrics 不匹配

通過 sklearn 方法計算指標。

fpr_2, tpr_2, thresholds_2 = roc_curve(y_test, y_pred, pos_label = "Yes")
plt.plot(fpr_2, tpr_2, linewidth=2)
plt.plot([0, 1], [0, 1], 'k--' )

ax = plt.subplot(1, 1, 1)
ax.scatter(list_fpr, list_recall,  c='red')
plt.show()

手動 ROC 曲線與 sklearn.metrics 不匹配

為什么我計算的值與 sklearn 指標之間的值不匹配？

uj5u.com熱心網友回復：

FPR 不是 1 精度。前者是FP/(FP TN)，后者是FP/(FP TP)。

修正recall_fpr功能有

    False_Positive_rate = round(cm[1, 0] / (cm[1, 0]   cm[1, 1]), 3) #FP /(FP   TN)

給出正確的 ROC 曲線：

手動 ROC 曲線與 sklearn.metrics 不匹配

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/465900.html

標籤：Python 机器学习 scikit-学习鹏奥克

上一篇：ValueError：“順序”層的輸入0與層不兼容：預期形狀=（無，160,160,3），找到形狀=（160,160,3）

下一篇：Word2vecraiseKeyError(f"Key'{key}'不存在")