請在我的迷你專案中需要您的幫助,我需要使用來自 Kaggle 的資料集創建預測模型,當我嘗試替換“值”列中的缺失資料時遇到錯誤。似乎該值被視為一個字串,因為它們在數字之間有點。無法手動編輯該列,它有超過 49000 行。如何解決這個問題?
這是代碼和錯誤:
x['value'].replace(' ',np.NaN).astype(np.float)ValueError:無法將字串轉換為浮點數:'154.619.063'
資料集:按工業部門劃分的跨國公司 Kaggle 的資料集 非常感謝您的幫助
uj5u.com熱心網友回復:
試試這個:
x['value'].str.replace('.', '').replace(' ', np.NaN).astype(np.float)
uj5u.com熱心網友回復:
import numpy as np
import pandas as pd
from pandas import read_csv
from pandas.plotting import scatter_matrix
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
%pylab inline
import seaborn as sns
import pandas_profiling as pp
import plotly.graph_objs as go
from plotly.offline import iplot
import plotly.express as px
import tensorflow as tf
df = pd.read_csv("C:\\Users\\Souf win\\Downloads\\multinationals.csv", delimiter = ';')
def preprocessing(df):
df = df.copy()
df = df.drop(['partner country','ind', 'var','declaring country','unit code','part', 'cou', 'year','year.1', 'unit', 'power_code code', 'power_code' , 'reference period code', 'reference period' ], axis=1)
missing_target_rows=df[df['value'].isna()].index
df= df.drop(missing_target_rows, axis=0).reset_index(drop=True)
df['value']=df['value'].str.replace('.', '').replace(' ',np.NaN).astype(np.float)
for column in ['economic variable' ,'industry' ]:
dummies=pd.get_dummies(df[column], prefix=column)
df = pd.concat([df, dummies], axis=1)
df = df.drop(column, axis=1)
#split df to x and y
y = df['value']
x = df.drop('value', axis=1)
#Train_test split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, shuffle=True, random_state=1 )
#scale x
scaler = StandardScaler()
scaler.fit(x_train)
#x_train = scaler.transform(x_train)
x_train = pd.DataFrame(scaler.transform(x_train), index=x_train.index, columns=x_train.columns)
x_test = pd.DataFrame(scaler.transform(x_test), index=x_test.index, columns=x_test.columns)
return x_train, x_test, y_train, y_test
x_train, x_test, y_train, y_test = preprocessing(df)
x_train
y_train
x_train.shape
inputs = tf.keras.Input(shape=(86,))
x = tf.keras.layers.Dense(128, activation='relu')(inputs)
x = tf.keras.layers.Dense(128, activation='relu')(x)
outputs=tf.keras.layers.Dense(1, activation='linear')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(
optimizer='adam',
loss = 'mse'
)
history=model.fit(
x_train,
y_train,
validation_split=0.2,
batch_size=32,
epochs=100,
callbacks= [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
]
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/423427.html
標籤:
上一篇:重新排列txt內容腳本
