我是資料分析的新手,我一直在使用 matplotlib、numpy 和 pandas 進行回歸分析編碼。但是,我遇到了一些麻煩,并盡我所能通過 Stackoverflow 和其他網站找到解決問題的方法,但我還是做不到。
這是代碼
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Training_Dataset = pd.read_csv("https://raw.githubusercontent.com/kuroisep/Problem-1-Data-Analytic/main/A-train.csv?token=GHSAT0AAAAAABZ7O6G34ZTK4PMQGGZLRVS4Y2LAJSQ")
Training_Dataset = Training_Dataset.dropna()
X_train = np.array(Training_Dataset.iloc[:, :-1].values) # Independent Variable
y_train = np.array(Training_Dataset.iloc[:, 1].values) # Dependent Variable
Testing_Datatset = pd.read_csv("https://raw.githubusercontent.com/kuroisep/Problem-1-Data-Analytic/main/A-test.csv?token=GHSAT0AAAAAABZ7O6G2PRKTCT6YBKLZZSRWY2LAK2Q")
Testing_Dataset = Testing_Dataset.dropna()
X_test = np.array(Testing_Dataset.iloc[:, :-1].values) # Independent Variable
y_test = np.array(Testing_Dataset.iloc[:, 1].values) # Dependent Variable
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
accuracy = regressor.score(X_test, y_test)
print('Accuracy = ' str(accuracy))
plt.style.use('seaborn')
plt.scatter(X_test, y_test, color = 'red', marker = 'o', s = 35, alpha = 0.5,
label = 'Test data')
plt.plot(X_train, regressor.predict(X_train), color = 'blue', label='Model Plot')
plt.title('Predicted Values vs Inputs')
plt.xlabel('Inputs')
plt.ylabel('Predicted Values')
plt.legend(loc = 'upper left')
plt.show()
這是語法錯誤詳細資訊
/usr/local/lib/python3.7/dist-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
4389 y = np.ma.ravel(y)
4390 if x.size != y.size:
-> 4391 raise ValueError("x and y must be the same size")
4392
4393 if s is None:
ValueError: x and y must be the same size
我的 X_test 值為 (shape = (6,8))
array([[-2.474000e 01, -1.550000e 00, 9.105000e 01, 2.401980e 03,
-1.520000e 00, 1.360000e 01, 5.660000e 00, 1.059000e 01],
[ 1.075400e 02, -2.869000e 01, -8.259000e 01, 2.578915e 04,
5.290000e 00, -8.560000e 00, 1.490000e 00, -4.730000e 00],
[ 4.508000e 01, 9.662000e 01, 5.185000e 01, 1.280000e 00,
3.580000e 00, 5.200000e-01, -6.860000e 00, -7.800000e-01],
[-1.228100e 02, 1.779000e 01, -1.828500e 02, 2.928970e 03,
-1.210000e 00, -2.060000e 00, 9.680000e 00, -8.590000e 00],
[ 7.761000e 01, -7.230000e 01, 9.728000e 01, 1.917394e 04,
-9.290000e 00, 8.600000e-01, 7.060000e 00, -8.060000e 00],
[-4.401000e 01, 1.316500e 02, 6.988000e 01, 1.778310e 03,
-1.375000e 01, -1.475000e 01, -1.227000e 01, -8.300000e-01]])
我的 y_test 值為 (shape=(6,))
array([ -1.55, -28.69, 96.62, 17.79, -72.3 , 131.65])
如果我的問題看起來很老,我很抱歉。謝謝你的好意
uj5u.com熱心網友回復:

對于第二個圖,regressor.predict(X_train)將給出預測y_train值,因此再次將預測值與實際值進行比較:
plt.plot(y_train, regressor.predict(X_train), color = 'blue', label='Model Plot')
輸出:

順便說一句,您的訓練測驗拆分不正確。您設定x2為目標變數,但x2也包含在training集合中,這就是模型完美擬合并給出100%準確性的原因,但這超出了這個問題的范圍。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/516056.html
上一篇:從找到的極值中獲取全行資料
