我將 python 3.8 與熊貓一起使用。我在 csv 檔案中有一些資料。我將資料放入 Pandas 資料框中并嘗試洗掉 Client_Name 字串的某些部分。一些 Client_Name 就像 Client_Name = myserv(4234) 所以我必須清理 (4234) 以使 Client_Name = myserv。很快,我必須清除 Client_Name 中的括號 (4234)。
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
我寫了上面的代碼,它正在清除 Client_Name 中的括號。我的問題是,如果 (4234) 位于資料幀的第一行,則會出錯。如果 (4234) 在其他行,則沒有問題。
作業資料是:
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3(4234),6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5(4234),3<br>
5,2018-10-14T21:02:29Z,myserv6(4234),3<br>
當我運行我的代碼時,它會洗掉 (4234) 并且資料變成以下格式:
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>
但是如果 (4234) 在第一行,如下所示,我的代碼會拋出錯誤:
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1(4234),5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>
錯誤是:
test1.py:97: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 972, in __setitem__
self._set_with_engine(key, value)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 1005, in _set_with_engine
loc = self.index._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 75, in pandas._libs.index.IndexEngine.get_loc
TypeError: '0 True
1 False
2 False
3 False
4 False
...
116 False
117 False
118 False
119 False
120 False
Name: Client_Name, Length: 121, dtype: bool' is an invalid key
在處理上述例外的程序中,又發生了一個例外:
Traceback (most recent call last):
File "test1.py", line 97, in <module>
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 992, in __setitem__
self._where(~key, value, inplace=True)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 9129, in _where
new_data = self._mgr.putmask(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 579, in putmask
return self.apply(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py", line 1144, in putmask
raise ValueError("cannot assign mismatch length to masked array")
ValueError: cannot assign mismatch length to masked array
uj5u.com熱心網友回復:
您的切片方法會生成一個副本,您可以對其進行修改,這是警告。
你可以改用:
df['Client_Name'] = df['Client_Name'].str.replace('\(.*?\)', '', regex=True)
輸出:
time Client_Name Minutes
0 2018-10-14T21:01:00Z myserv1 5
1 2018-10-14T21:01:00Z myserv2 5
2 2018-10-14T21:01:00Z myserv3 6
3 2018-10-14T21:01:00Z myserv4 6
4 2018-10-14T21:02:07Z myserv5 3
5 2018-10-14T21:02:29Z myserv6 3
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/392011.html
上一篇:將df附加到csv檔案的新行會在添加的資料之間添加一個空行
下一篇:根據條件更改列的值:Pandas
