我試圖使用規范化預處理我的資料。
# preprocessing
import tensorflow as tf
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from tensorflow.keras import layers
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
np.set_printoptions(precision=3, suppress=True)
btc_data = pd.read_csv(
"output.csv",
names=["Time", "Open"])
ct = make_column_transformer(
(MinMaxScaler(), ["Time", "Open"]),
(OneHotEncoder(handle_unknown="ignore"), ["Time", "Open"])
)
X_btc = btc_data["Time"]
y_btc = btc_data["Open"]
X_train, X_test, y_train, y_test = train_test_split(X_btc, y_btc, test_size=0.2, random_state=62)
ct.fit(X_train)
X_train_normal = ct.transform(X_train)
X_test_normal = ct.transform(X_test)
該代碼在 Colab 筆記本上運行。該資料集來自 Kaple,它被修改為充滿 Unix 時間戳和當時開放的位元幣價格的另一列。拆分資料并創建列轉換器后,我嘗試擬合資料。但是,我收到以下錯誤:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-44-f73622372111> in <module>()
27 print(X_train.shape)
28
---> 29 ct.fit(X_train)
30 X_train_normal = ct.transform(X_train)
31 X_test_normal = ct.transform(X_test)
3 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/__init__.py in _get_column_indices(X, key)
387 :func:`_safe_indexing_column`.
388 """
--> 389 n_columns = X.shape[1]
390
391 key_dtype = _determine_key_type(key)
IndexError: tuple index out of range
我想知道這是否是形狀問題,但請注意, X_train 資料是 shape (2020896,)。
有什么我必須用我的資料來修復這個錯誤嗎?
uj5u.com熱心網友回復:
您將 X_btc 提取為類似于一維陣列的 Pandas 系列,您需要提取 DataFrame(二維陣列/矩陣)。代替:
X_btc = btc_data["Time"]
和:
X_btc = btc_data[["Time"]]
提取資料幀
編輯新錯誤:
KeyError 的發生是因為這個轉換器:
ct = make_column_transformer(
(MinMaxScaler(), ["Time", "Open"]),
(OneHotEncoder(handle_unknown="ignore"), ["Time", "Open"])
)
您正在使用["Time", "Open"]列。但是, X_btc 沒有列"Open" (因為您只選擇了 column "Time")。該"Open"是目標標簽(y_btc),你不應該它包括成X_btc。在這種情況下,您可以"Open"從make_column_transformer以下位置洗掉:
ct = make_column_transformer(
(MinMaxScaler(), ["Time"]),
(OneHotEncoder(handle_unknown="ignore"), ["Time"])
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/364527.html
