我有一列包含全名的字串。姓氏被區分為一組全大寫字母,而名字則以正確的形式給出。大多數名稱按 (Firstname, LASTNAME) 排序,但許多名稱在字串的中間或開頭包含 LASTNAME 資訊,如此處的最后一個條目。
0 Manuel JOSE
1 Vincent MUANDUMBA
2 Alejandro DE LORRES
3 Luis FILIPE da Rivera
4 LIM Jock Hoi
我想根據字串中的文本是大寫(名字)還是全大寫(姓氏),將此列拆分為單獨的名字和姓氏列。
new = df["FullName"].str.split(pat=r'(?=[A-Z][a-z])', n=1, expand = True)
df['FirstName'] = new[0]
df['LastName'] = new[1]
所有正確或小寫的字串都應歸為一組,new[0]而所有大寫的字串均應歸為一組new[1]
但是,由于我的正則運算式不正確,因此我似乎無法實作所需的輸出。我也試過 pat=r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)'
uj5u.com熱心網友回復:
您可以使用正則運算式:
df['LastName'] = df['FullName'].str.findall(r'\b[A-Z] (?:\s [A-Z] )*\b').str.join(' ')
df['FirstName'] = df['FullName'].str.findall(r"[A-Z]{0,1}[a-z] ").str.join(' ')
輸出:
names last_names first_names
0 Manuel JOSE JOSE Manuel
1 Vincent MUANDUMBA MUANDUMBA Vincent
2 Alejandro DE LORRES DE LORRES Alejandro
3 Luis FILIPE da Rivera FILIPE Luis da Rivera
4 LIM Jock Hoi LIM Jock Hoi
uj5u.com熱心網友回復:
這段代碼比使用 str 模式要長一些,但您可以確定它會根據需要將名稱字串的每個部分發送到firstname或lastname。技巧是使用 .istitle() 函式。
# Split every string in FullName column by returning a list of words
new = df["FullName"].str.split(' ')
# Create empty lists to keep new columns for df
FirstName = []
LastName = []
# Iterate over every splitted string (sample)
for name in new:
Proppercase =[] #This keeps values for FirstName condition
Allcaps = [] # This keeps values for LastName (all-caps)
# Iterate over every word in the sample
for n in name:
# Check if it is proppercase or lower ('da')
if n.istitle() or n.islower():
Proppercase.append(n)
# If not, it is all-caps
else:
Allcaps.append(n)
# Add proppercase words to FirstName list
FirstName.append(' '.join(Proppercase))
# All-caps words to LastName list
LastName.append(' '.join(Allcaps))
# Create columns
df['FirstName'] = FirstName
df['LastName'] = LastName
輸出:
FullName FirstName LastName
0 Manuel JOSE Manuel JOSE
1 Vincent MUANDUMBA Vincent MUANDUMBA
2 Alejandro DE LORRES Alejandro DE LORRES
3 Luis FILIPE da Rivera Luis da Rivera FILIPE
4 LIM Jock Hoi Jock Hoi LIM
如果您確定名稱中的第一個單詞是完整的 Firstname 或 Lastname(大多數文化但不太通用),這會更快:
new = df["FullName"].str.split(' ',1)
FirstName = []
LastName = []
for name in new:
if name[0].istitle():
FirstName.append(name[0])
LastName.append(name[1])
else:
FirstName.append(name[1])
LastName.append(name[0])
df['FirstName'] = FirstName
df['LastName'] = LastName
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/370922.html
上一篇:計算字串列中的出現次數
