我有資料集。這是“名稱”列:
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
151 Pears, Mrs. Thomas (Edith Wearne)
152 Meo, Mr. Alfonzo
153 van Billiard, Mr. Austin Blyler
154 Olsen, Mr. Ole Martin
155 Williams, Mr. Charles Duane
并且需要提取名字、狀態和第二名。當我在簡單的字串上嘗試這個時,它可以:
full_name="Braund, Mr. Owen Harris"
first_name=full_name.split(',')[0]
second_name=full_name.split('.')[1]
print('First name:',first_name)
print('Second name:',second_name)
status = full_name.replace(first_name, '').replace(',','').split('.')[0]
print('Status:',status)
>First name: Braund
>Second name: Owen Harris
>Status: Mr
但是在嘗試使用熊貓執行此操作后,我的狀態失敗了:
df['first_Name'] = df['Name'].str.split(',').str.get(0) #its ok, worsk well
但在這之后:
status= df['Name'].str.replace(df['first_Name'], '').replace(',','').split('.').str.get(0)
我有一個錯誤:
>>TypeError: 'Series' objects are mutable, thus they cannot be hashed
有哪些可能的解決方案?
編輯:感謝您的回答并提取列。我愿意
def extract_name_data(row):
row.str.extract('(?P<first_name>[^,] ), (?P<status>\w .) (?P<second_name>[^(] \w) ?')
last_name = row['second_name']
title = row['status']
first_name = row['first_name']
return first_name, second_name, status
并得到
AttributeError: 'str' object has no attribute 'str'
可以做什么?行的意思是 df['Name']
uj5u.com熱心網友回復:
您可以str.extract與命名捕獲組一起使用:
df['Name'].str.extract('(?P<first_name>[^,] ), (?P<status>\w .) (?P<second_name>[^(] \w) ?')
輸出:
first_name status second_name
0 Braund Mr. Owen Harris
1 Cumings Mrs. John Bradley
2 Heikkinen Miss. Laina
3 Futrelle Mrs. Jacques Heath
4 Allen Mr. William Henry
5 Pears Mrs. Thomas
6 Meo Mr. Alfonzo
7 van Billiard Mr. Austin Blyler
8 Olsen Mr. Ole Martin
9 Williams Mr. Charles Duane
uj5u.com熱心網友回復:
您也可以將您的原始代碼稍加修改后放入 Pandas.apply()函式中使其作業,如下所示:
只需將 Python 中的變數名替換為 Pandas 中的列名即可。例如,在 function 的 lambda 函式中替換full_namewithx['Name']和first_namewith :x['first_Name'].apply()
df['status'] = df.apply(lambda x: x['Name'].replace(x['first_Name'], '').replace(',','').split('.')[0], axis=1)
雖然可能不是最有效的方法,但它是一種將 Python 中的現有代碼輕松修改為 Pandas 中可用版本的方法。
結果:
print(df)
Name first_Name status
0 Braund, Mr. Owen Harris Braund Mr
1 Cumings, Mrs. John Bradley (Florence Briggs Th... Cumings Mrs
2 Heikkinen, Miss. Laina Heikkinen Miss
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle Mrs
4 Allen, Mr. William Henry Allen Mr
151 Pears, Mrs. Thomas (Edith Wearne) Pears Mrs
152 Meo, Mr. Alfonzo Meo Mr
153 van Billiard, Mr. Austin Blyler van Billiard Mr
154 Olsen, Mr. Ole Martin Olsen Mr
155 Williams, Mr. Charles Duane Williams Mr
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/317721.html
上一篇:如何只在字串的末尾加上引號,然后在R中用逗號分隔輸出?
下一篇:java中分割字串的問題
