Scikit-Learn使用日期時間值和預測的線性回歸-有解無憂

下面是資料集的示例。

行號	約會時間	活力
1個	2008-03-01 00:00:00	1259.985563
2個	2008-03-01 01:00:00	1095.541500
3個	2008-03-01 02:00:00	1056.247500
4個	2008-03-01 03:00:00	1034.742000
5個	2008-03-01 04:00:00	1026.334500

該資料集具有該小時的日期時間值和能耗object以及float64資料型別。我想使用datetime列作為單一特征來預測能量。

我使用了以下代碼

    train['datetime'] = pd.to_datetime(train['datetime'])
    X = train.iloc[:,0]
    y = train.iloc[:,-1]

由于出現以下錯誤，我無法將單個功能作為系列傳遞給擬合物件。

ValueError: Expected 2D array, got 1D array instead:
array=['2008-03-01T00:00:00.000000000' '2008-03-01T01:00:00.000000000'
 '2008-03-01T02:00:00.000000000' ... '2018-12-31T21:00:00.000000000'
 '2018-12-31T22:00:00.000000000' '2018-12-31T23:00:00.000000000'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or  
array.reshape(1, -1) if it contains a single sample.

所以我按照建議轉換了它們的形狀。

 X = np.array(X).reshape(-1,1)
 y = np.array(y).reshape(-1,1)
 
 from sklearn.linear_model import LinearRegression
 model_1 = LinearRegression()
 model_1.fit(X,y)
 
 test = pd.to_datetime(test['datetime'])
 test = np.array(test).reshape(-1,1)
 
 predictions = model_1.predict(test)

LinearRegression 物件適合特征X和目標y而不會引發任何錯誤。但是當我將測驗資料傳遞給預測方法時，它拋出了以下錯誤。

TypeError: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. 
This means that no common DType exists for the given inputs. 
For example they cannot be stored in a single array unless the dtype is `object`. 
The full list of DTypes is: (<class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[float64]'>)

我無法解決這個錯誤。如何將日期時間值用作單個特征并應用簡單線性回歸來預測目標值并進行 TimeSeries 預測？我哪里做錯了？

uj5u.com熱心網友回復：

您不能使用日期時間格式進行訓練。如果您希望模型學習日期時間特征，請考慮將其拆分為日、月、作業日、年周、小時等，以學習具有季節性的模式：

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

df = pd.DataFrame(data=[["2008-03-01 00:00:00",1259.985563],["2008-03-01 01:00:00",1095.541500],["2008-03-01 02:00:00",1056.247500],["2008-03-01 03:00:00",1034.742000],["2008-03-01 04:00:00",1026.334500]], columns=["datetime","energy"])
df["datetime"] = pd.to_datetime(df["datetime"])
features = ["year", "month", "day", "hour", "weekday", "weekofyear", "quarter"]
df[features] = df.apply(lambda row: pd.Series({"year":row.datetime.year, "month":row.datetime.month, "day":row.datetime.day, "hour":row.datetime.hour, "weekday":row.datetime.weekday(), "weekofyear":row.datetime.weekofyear, "quarter":row.datetime.quarter }), axis=1)

X = df[features]
y = df[["energy"]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(mean_squared_error(y_test, y_pred))

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/537302.html

標籤：Python熊猫约会时间scikit学习预测

上一篇：TypeError:'>='在'builtin_function_or_method'和'datetime.time'的實體之間不支持

下一篇：Pandasread_parquet()錯誤：pyarrow.lib.ArrowInvalid：從時間戳[us]轉換為時間戳[ns]會導致時間戳超出范圍