MicrosoftML情緒分析列印不正確的預測結果？-有解無憂

我使用 C# 和 Microsoft ML 庫構建了一個文本分析模型。微軟提供的資料集擅長預測一些評論字串的值，比如Batteries not included，它會列印一個負數No batteries，它也列印一個負預測值。但是，我已經針對和之類的值對其進行了測驗Not bad，This is really bad它列印了兩者的預測值Positive，這是不正確的。是否有更大的資料集文本檔案可用于提高模型的準確性。我從Microsoft檔案中實作了用于情緒分析的教程。60kb用于訓練文本分析模型的資料集非常小。資料集名稱是yelp_labelled.txt. 它包含示例陳述句，每個陳述句的值為 0（負）或 1（正）。在哪里可以找到更大的資料集來訓練我的文本分析預測？我使用的代碼如下

using AnalysisSentiment;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.DataOperationsCatalog;

//create a field to hold the data file
string _dataPath = "yelp_labelled.txt";
//initialize the context
MLContext mlContext = new MLContext();
TrainTestData splitDataView = LoadData(mlContext);
ITransformer model = BuildAndTrainModel(mlContext, splitDataView.TrainSet);
Evaluate(mlContext, model, splitDataView.TestSet);
UseModelWithSingleItem(mlContext, model);


TrainTestData LoadData(MLContext mlContext)
{
    IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentData>(_dataPath, hasHeader: false);
    TrainTestData splitDataView = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
    return splitDataView;   
}
ITransformer BuildAndTrainModel(MLContext mlContext, IDataView splitTrainSet)
{
    var estimator = mlContext.Transforms.Text.FeaturizeText(outputColumnName: "Features", inputColumnName: nameof(SentimentData.SentimentText))
    .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(labelColumnName: "Label", featureColumnName: "Features"));
    Console.WriteLine("=============== Create and Train the Model ===============");
    var model = estimator.Fit(splitTrainSet);
    Console.WriteLine("=============== End of training ===============");
    Console.WriteLine();
    return model;
}
void Evaluate(MLContext mlContext, ITransformer model, IDataView splitTestSet)
{
    Console.WriteLine("=============== Evaluating Model accuracy with Test data===============");
    IDataView predictions = model.Transform(splitTestSet);
    CalibratedBinaryClassificationMetrics metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
    Console.WriteLine();
    Console.WriteLine("Model quality metrics evaluation");
    Console.WriteLine("--------------------------------");
    Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
    Console.WriteLine($"Auc: {metrics.AreaUnderRocCurve:P2}");
    Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
    Console.WriteLine("=============== End of model evaluation ===============");
}
void UseModelWithSingleItem(MLContext mlContext, ITransformer model)
{
    PredictionEngine<SentimentData, SentimentPrediction> predictionFunction = mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(model);
    SentimentData sampleStatement = new SentimentData
    {
        SentimentText = "not bad"
    };
    var resultPrediction = predictionFunction.Predict(sampleStatement);
    Console.WriteLine();
    Console.WriteLine("=============== Prediction Test of model with a single sample and test dataset ===============");

    Console.WriteLine();
    Console.WriteLine($"Sentiment: {resultPrediction.SentimentText} | Prediction: {(Convert.ToBoolean(resultPrediction.Prediction) ? "Positive" : "Negative")} | Probability: {resultPrediction.Probability} ");

    Console.WriteLine("=============== End of Predictions ===============");
    Console.WriteLine();
}

uj5u.com熱心網友回復：

遷移學習：由于您的資料集較低，最好的方法是對情感資料集（如 IMBD 電影評論等）進行預訓練，然后對資料集進行微調。
但是，您使用的模型是一個簡單的 Logistic 回歸，不支持預訓練和微調。因此，您必須將下劃線 ML 模型更改為深度學習模型。
添加更多類似的資料：如果您無法更改下劃線的 Logistic 回歸模型，那么您可以嘗試將 IMDB 資料集添加到您的資料集中并從頭開始訓練，看看您的模型測驗性能是否有所提高。它可能會起作用，因為 IMDB 是一個兩類（正面和負面）資料集，它看起來與您的資料集非常相似。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/479018.html

標籤：C＃机器学习

上一篇：運行多個LinearRegressions測驗時精度沒有增加

下一篇：TensorFlow影像二進制分類器在訓練后無法有效作業