我從檔案中讀取了一些資料。由于第一個資料行中的 XXX,第一列被分配了“物件”型別:
tips = pd.read_csv("tips.csv")
print(tips.head())
print(tips.info())
total_bill tip sex smoker day time size
0 xxx 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_bill 244 non-null object
1 tip 244 non-null float64
2 sex 244 non-null object
3 smoker 244 non-null object
4 day 244 non-null object
5 time 244 non-null object
6 size 244 non-null int64
因此,這將失敗,因為第一行資料中的一個 XXX 數字應該是:
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
如何重寫上面的行以過濾掉壞行,而不實際更改 DataFrame 的內容?
uj5u.com熱心網友回復:
你可以用已在“XXX”列pd.to_numeric使用errors='coerce'。這會將字串型別值轉換為,NaN以便您的操作可以發生并且您的資料框將保持不變
tips['tip_pct'] = tips['tip'] / (pd.to_numeric(tips['total_bill'],errors='coerce') - tips['tip'])
total_bill tip sex smoker day time size Unnamed: 4 tip_pct
0 xxx 1.01 Female No Sun Dinner 2 NaN
1 10.34 1.66 Male No Sun Dinner 3 0.191244
2 21.01 3.50 Male No Sun Dinner 3 0.199886
3 23.68 3.31 Male No Sun Dinner 2 0.162494
4 24.59 3.61 Female No Sun Dinner 4 0.172069
uj5u.com熱心網友回復:
另一種方式,掩碼,強制total_bill浮動計算
m=tips['total_bill']!='xxx'
tips['tip_pct'] =tips.loc[m,'tip'] / (tips.loc[m,'total_bill'].astype(float) - tips.loc[m,'tip'])
total_bill tip sex smoker day time size tip_pct
0 xxx 1.01 Female No Sun Dinner 2 NaN
1 10.34 1.66 Male No Sun Dinner 3 0.191244
2 21.01 3.50 Male No Sun Dinner 3 0.199886
3 23.68 3.31 Male No Sun Dinner 2 0.162494
4 24.59 3.61 Female No Sun Dinner 4 0.172069
uj5u.com熱心網友回復:
從 read_csv
data = pd.read_csv('tips.csv',
dtype={'total_bil': np.float64})
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/397246.html
上一篇:如何回傳最后一條評論?
