使用RandomizedSearchCV的XGBoost分類器的precision

我正在嘗試使用 XGBoost 制作一個分類器，我將它與 RandomizedSearchCV 配合使用。

這是我的函式的代碼：

def xgboost_classifier_rscv(x,y):
    from scipy import stats
    from xgboost import XGBClassifier
    from sklearn.metrics import fbeta_score, make_scorer, recall_score, accuracy_score, precision_score
    from sklearn.model_selection import StratifiedKFold, GridSearchCV, RandomizedSearchCV

    #splitting the dataset into training and test parts
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

    #bag of words implmentation
    cv = CountVectorizer()
    x_train = cv.fit_transform(x_train).toarray()

    #TF-IDF implementation
    vector = TfidfTransformer()
    x_train = vector.fit_transform(x_train).toarray()
    x_test = cv.transform(x_test)
    
    scorers = {
            'f1_score':make_scorer(f1_score),
            'precision_score': make_scorer(precision_score),
            'recall_score': make_scorer(recall_score),
            'accuracy_score': make_scorer(accuracy_score)
          }

    param_dist = {'n_estimators': stats.randint(150, 1000),
                  'learning_rate': stats.uniform(0.01, 0.59),
                  'subsample': stats.uniform(0.3, 0.6),
                  'max_depth': [3, 4, 5, 6, 7, 8, 9],
                  'colsample_bytree': stats.uniform(0.5, 0.4),
                  'min_child_weight': [1, 2, 3, 4]
                 }
 n_folds = numFolds)
    skf = StratifiedKFold(n_splits=3, shuffle = True)
    gridCV = RandomizedSearchCV(xgb_model, 
                             param_distributions = param_dist,
                             cv = skf,  
                             n_iter = 5,  
                             scoring = scorers, 
                             verbose = 3, 
                             n_jobs = -1,
                             return_train_score=True,
                             refit = precision_score)

    gridCV.fit(x_train,y_train)
    best_pars = gridCV.best_params_
    print("best params : ", best_pars)
    xgb_predict = gridCV.predict(x_test)
    xgb_pred_prob = gridCV.predict_proba(x_test)
    print('best scores : ', gridCV.grid_scores_)
    scores = [x[1] for x in gridCV.grid_scores_]
    print("best scores : ", scores)

    return y_test, xgb_predict, xgb_pred_prob

當我運行代碼時，出現錯誤，報告如下：

TypeError                                 Traceback (most recent call last)
<ipython-input-30-9adf84d48e5c> in <module>
      1 print("********** Xgboost classifier *************")
      2 start_time = time.monotonic()
----> 3 y_test, xgb_predict, xgb_pred_prob = xgboost_classifier_rscv(x,y)
      4 end_time = time.monotonic()
      5 print("the time consumed is : ", timedelta(seconds=end_time - start_time))

<ipython-input-29-e0c6ae026076> in xgboost_classifier_rscv(x, y)
     70 #                                 verbose=3, random_state=1001, refit='precision_score' )
     71 
---> 72     gridCV.fit(x_train,y_train)
     73     best_pars = gridCV.best_params_
     74     print("best params : ", best_pars)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    858             # parameter set.
    859             if callable(self.refit):
--> 860                 self.best_index_ = self.refit(results)
    861                 if not isinstance(self.best_index_, numbers.Integral):
    862                     raise TypeError('best_index_ returned is not an integer')

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

TypeError: precision_score() missing 1 required positional argument: 'y_pred'

當我用 GridSearchCV 而不是 RandomizedSearchCV 做同樣的事情時，代碼運行沒有任何問題！

uj5u.com熱心網友回復：

不是precision_score它'precision_score'（帶有' '），像這樣-

gridCV = RandomizedSearchCV(xgb_model, 
                         param_distributions = param_dist,
                         cv = skf,  
                         n_iter = 5,  
                         scoring = scorers, 
                         verbose = 3, 
                         n_jobs = -1,
                         return_train_score=True,
                         refit = 'precision_score')

另一個錯誤：

grid_scores_已被洗掉，因此將其更改為cv_results_（在最后的第 3 行和第 4 行中）

print('best scores : ', gridCV.cv_results_)
scores = [x[1] for x in gridCV.cv_results_]

還有一個錯誤：

你還沒有定義那個xgb_model，所以添加那個。

xgb_model = XGBClassifier(n_jobs = -1, random_state = 42)

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/329465.html

標籤：Python 机器学习 xgboost 网格搜索得分手

上一篇：機器學習模型中的random_states引數

下一篇：無需深度學習或機器學習的影像處理

使用RandomizedSearchCV的XGBoost分類器的precision_score誤差