我構建了一個modelPipeline,它運行多個分類器并將pipeline每個分類器的回傳和分數作為DataFrame.
如何使用GridsearchCV在下面modelPipeline?是否可以GridsearchCV在 Pipeline 中與多個分類器一起使用?
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from xgboost import XGBClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import sklearn.metrics as skm
import os
rs = {'random_state': 42}
# Train-test Split
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size = 0.33,
random_state = 42)
# Classification - Model Pipeline
def modelPipeline(X_train, X_test, y_train, y_test):
log_reg = LogisticRegression(**rs)
nb = BernoulliNB()
knn = KNeighborsClassifier()
svm = SVC(**rs)
mlp = MLPClassifier(max_iter=500, **rs)
dt = DecisionTreeClassifier(**rs)
et = ExtraTreesClassifier(**rs)
rf = RandomForestClassifier(**rs)
xgb = XGBClassifier(**rs, verbosity=0)
clfs = [
('Logistic Regression', log_reg),
('Naive Bayes', nb),
('K-Nearest Neighbors', knn),
('SVM', svm),
('MLP', mlp),
('Decision Tree', dt),
('Extra Trees', et),
('Random Forest', rf),
('XGBoost', xgb)
]
pipelines = []
scores_df = pd.DataFrame(columns=['Model', 'F1_Score', 'Precision', 'Recall', 'Accuracy', 'ROC_AUC'])
for clf_name, clf in clfs:
pipeline = Pipeline(steps=[
('scaler', StandardScaler()),
('classifier', clf)
]
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
# F1-Score
fscore = skm.f1_score(y_test, y_pred)
# Precision
pres = skm.precision_score(y_test, y_pred)
# Recall
rcall = skm.recall_score(y_test, y_pred)
# Accuracy
accu = skm.accuracy_score(y_test, y_pred)
# ROC_AUC
roc_auc = skm.roc_auc_score(y_test, y_pred)
pipelines.append(pipeline)
scores_df = scores_df.append({
'Model' : clf_name,
'F1_Score' : fscore,
'Precision' : pres,
'Recall' : rcall,
'Accuracy' : accu,
'ROC_AUC' : roc_auc
},
ignore_index=True)
return pipelines, scores_df
uj5u.com熱心網友回復:
GridSearchCV可以給出一個分類器串列,供管道中的最后一步選擇。但是,它不會完全按照您的代碼執行:最值得注意的是,擬合模型不會被 保存GridSearchCV,而只是分數(以及最終選擇的所有資料改裝模型,如果refit != False)。
pipe = Pipeline(steps=[
('scaler', StandardScaler()),
('classifier', DummyClassifier()), # doesn't matter, we're going to override this in the search
])
params = {
'classifier': [log_reg, nb, knn, svm, mlp, dt, et, rf, xgb],
}
scoring = ['f1', 'precision', 'recall', 'accuracy', 'roc_auc']
search = GridSearchCV(pipe, params, scoring=scoring, refit=False)
(多個指標需要設定refit為False、其中一個scoring條目或自定義可呼叫項。)
uj5u.com熱心網友回復:
從您對我的其他答案的評論來看,也許您只是想調整每個模型?(那么您應該將示例簡化為單個分??類器,因為多個分類器將獨立運行(?)。)
所以,例如
log_reg_params = {'C': [0.1, 1, 10]}
...
xgb_params = {
'learning_rate': [0.05, 0.1, 0.2],
'max_depth': [1, 2, 3, 5, 8],
'reg_lambda': [0, 1, 10],
}
clfs = [
('Logistic Regression', log_reg, log_reg_params),
('Naive Bayes', nb, nb_params),
...
('XGBoost', xgb, xgb_params),
]
for clf_name, clf, param_grid in clfs:
pipeline = Pipeline(steps=[
('scaler', StandardScaler()),
('classifier', clf),
])
search = GridSearchCV(pipeline, {f'classifier__{paramname}': paramvalue for paramname, paramvalue in param_grid.items()})
search.fit(X_train, y_train)
...
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/398944.html
上一篇:AzureMLStudio-容器已崩潰。您的init方法是否失敗
下一篇:為什么這個NGINX組態檔無效?
