我正在嘗試從 URL匯入MPG 來自 UCI的流行資料集的所有 9 列。問題是,而不是顯示的字串值,Carname(第九列)由NaN.
出了什么問題,如何解決這個問題?存盤庫的鏈接顯示原始資料集有 9 列,所以這應該有效。
從 URL 中我們發現資料看起來像
18.0 8 307.0 130.0 3504. 12.0 70 1 "chevrolet chevelle malibu"
15.0 8 350.0 165.0 3693. 11.5 70 1 "buick skylark 320"
具有唯一的字串值,Carname但是當我們將其匯入為
import pandas as pd
# Import raw dataset from URL
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower',
'Weight', 'Acceleration', 'Model Year', 'Origin', 'Carname']
data = pd.read_csv(url, names=column_names,
na_values='?', comment='\t',
sep=' ', skipinitialspace=True)
data.head(3)
產生(NaN值在Carname)
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Origin Carname
0 18.0 8 307.0 130.0 3504.0 12.0 70 1 NaN
1 15.0 8 350.0 165.0 3693.0 11.5 70 1 NaN
uj5u.com熱心網友回復:
它確實在您的read_csv電話中:comment='\t'. 唯一的選項卡在Carname欄位之前,這意味著您閱讀檔案的方式明確忽略了該列。
您可以洗掉comment引數并使用更通用的分隔符\s 來拆分任何空格(一個或多個空格、制表符等):
>>> pd.read_csv(url, names=column_names, na_values='?', sep='\s ')
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Origin Carname
0 18.0 8 307.0 130.0 3504.0 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165.0 3693.0 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150.0 3436.0 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150.0 3433.0 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140.0 3449.0 10.5 70 1 ford torino
.. ... ... ... ... ... ... ... ... ...
393 27.0 4 140.0 86.0 2790.0 15.6 82 1 ford mustang gl
394 44.0 4 97.0 52.0 2130.0 24.6 82 2 vw pickup
395 32.0 4 135.0 84.0 2295.0 11.6 82 1 dodge rampage
396 28.0 4 120.0 79.0 2625.0 18.6 82 1 ford ranger
397 31.0 4 119.0 82.0 2720.0 19.4 82 1 chevy s-10
[398 rows x 9 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/323621.html
