我試圖弄清楚哪些變數會影響toAnalyse變數。為此,我使用 LogisticRegression 方法。當我運行下面的代碼時,出現以下錯誤:
代碼:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from matplotlib import rcParams
from sklearn.linear_model import LogisticRegression
rcParams['figure.figsize'] = 14, 7
rcParams['axes.spines.top'] = False
rcParams['axes.spines.right'] = False
data = pd.read_csv('file.txt', sep=",")
df = pd.concat([
pd.DataFrame(data, columns=data.columns),
pd.DataFrame(data, columns=['toAnalyse'])
], axis=1)
X = df.drop(['notimportant', 'test', 'toAnalyse'], axis=1)
y = df['toAnalyse']
#y.drop(y.columns[0], axis=1, inplace=True) <----------------- From 2 to 0 variables when running this?
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
ss = StandardScaler()
X_train_scaled = ss.fit_transform(X_train)
X_test_scaled = ss.transform(X_test)
錯誤:
ValueError: y should be a 1d array, got an array of shape (258631, 2) instead.
這似乎是正確的,因為當我列印時,y.info()我會回來:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344842 entries, 0 to 344841
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 toAnalyse 343480 non-null float64
1 toAnalyse 343480 non-null float64
toAnalyse因此該變數出現在 y 中兩次。好的,然后我想洗掉第一個(基于索引),以便我留下 1d 行。但是,當我使用 時y.drop(y.columns[0], axis=1, inplace=True) ,我收到錯誤訊息,其中根本沒有更多變數:
ValueError: y should be a 1d array, got an array of shape (258631, 0) instead.
發生了什么,我如何用一維陣列運行它?
uj5u.com熱心網友回復:
看起來像之后
df = pd.concat([
pd.DataFrame(data, columns=data.columns),
pd.DataFrame(data, columns=['toAnalyse'])
], axis=1)
您'toAnalyse'的資料框中有該列兩次。這首先是錯誤形狀的原因y。在drop查找列名時,您的 drop 陳述句之后沒有列。
為了解決這個問題,我只需洗掉帶有df. data似乎包含你需要的一切,所以
X = data.drop(['notimportant', 'test', 'toAnalyse'], axis=1)
y = data['toAnalyse']
應該管用。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/360940.html
上一篇:如何按組以最小差異過濾資料框
