我有一個包含一列名稱的資料框,我想提取姓氏并將其設為新列。但是,我遇到了一個問題。
這是我的資料框的玩具示例:
Candidate_Name Party State District Office Year Img_URL
961 Heather Mizeur D Maryland 1 House 2022 https://images.ctfassets.net/00vgtve3ank7/3v1O...
962 Heidi Campbell D Tennessee 5 House 2022 https://images.ctfassets.net/00vgtve3ank7/BbSQ...
963 Helen Brady R Massachusetts 9 House 2020 https://images.ctfassets.net/00vgtve3ank7/6WmS...
964 Henry Cuellar D Texas 28 House 2022 https://images.ctfassets.net/00vgtve3ank7/4GGP...
965 Henry Cuellar D Texas 28 House 2020 https://images.ctfassets.net/00vgtve3ank7/3xNd...
966 Henry Cuellar D Texas 28 House 2018 https://images.ctfassets.net/00vgtve3ank7/uCK7...
967 Henry Martin D Missouri 6 House 2022 https://images.ctfassets.net/00vgtve3ank7/5rfd...
968 Henry Robert Martin D Missouri 6 House 2018 https://images.ctfassets.net/00vgtve3ank7/MvL8...
969 Herb Jones D Virginia 1 House 2022 https://images.ctfassets.net/00vgtve3ank7/47Uy...
970 Herman West Jr. R Georgia 2 House 2018 https://images.ctfassets.net/00vgtve3ank7/534y...
971 Hilary Turner D West Virginia 3 House 2020 https://images.ctfassets.net/00vgtve3ank7/3ZIN...
972 Hillary O'Connor Mueri D Ohio 14 House 2020 https://images.ctfassets.net/00vgtve3ank7/5i5w...
973 Hillary Scholten D Michigan 3 House 2022 https://images.ctfassets.net/00vgtve3ank7/47KO...
974 Hillary Scholten D Michigan 3 House 2020 https://images.ctfassets.net/00vgtve3ank7/3g47...
975 Hiral Tipirneni D Arizona 8 House 2018 https://images.ctfassets.net/00vgtve3ank7/3e9V...
976 Hiral Tipirneni D Arizona 6 House 2020 https://images.ctfassets.net/00vgtve3ank7/1APF...
977 Holden Hoggatt R Louisiana 3 House 2022 https://images.ctfassets.net/00vgtve3ank7/4tQP...
978 Homer Markel D Illinois 12 House 2022 https://images.ctfassets.net/00vgtve3ank7/3XXY...
979 Hosea Cleveland D South Carolina 3 House 2020 https://images.ctfassets.net/00vgtve3ank7/FZKi...
980 Hung Cao R Virginia 10 House 2022 https://images.ctfassets.net/00vgtve3ank7/4Aql...
981 Ian Todd D Minnesota 6 House 2018 https://images.ctfassets.net/00vgtve3ank7/3WkL...
982 Ike McCorkle D Colorado 4 House 2022 https://images.ctfassets.net/00vgtve3ank7/d7UB...
983 Ilhan Omar D Minnesota 5 House 2022 https://images.ctfassets.net/00vgtve3ank7/3TS6...
984 Ilhan Omar D Minnesota 5 House 2020 https://images.ctfassets.net/00vgtve3ank7/4EDC...
985 Ilhan Omar D Minnesota 5 House 2018 https://images.ctfassets.net/00vgtve3ank7/4n9U...
986 Irene Armendariz-Jackson R Texas 16 House 2022 https://images.ctfassets.net/00vgtve3ank7/2nbG...
987 Irene Armendariz-Jackson R Texas 16 House 2020 https://images.ctfassets.net/00vgtve3ank7/6wKS...
988 Iro Omere D Texas 4 House 2022 https://images.ctfassets.net/00vgtve3ank7/gewL...
989 Isaac McCorkle D Colorado 4 House 2020 https://images.ctfassets.net/00vgtve3ank7/4sUc...
990 J. Michael Galbraith D Ohio 5 House 2018 https://images.ctfassets.net/00vgtve3ank7/27nQ...
我寫了一個函式,我稱之為 names:
def names(df):
"""
Description: Function to extract last names of candidate
Parameters: df: pandas dataFrame object
Depends on pandas and re
"""
if df["Candidate_Name"].eq('Jr.').all():
lastName = df["Candidate_Name"].str.split(' ').str.get(-4)
else:
lastName = df["Candidate_Name"].str.split(' ').str.get(-2)
return lastName
此函式的目標是我應該能夠從 Candidate_Name 列中獲取姓氏。但是,Jr. 確實有一些人這樣做,所以這增加了一點復雜性,我嘗試撰寫一個 if-else 陳述句來處理。但是,出了點問題。
因為當我運行以下命令時:
df["Last_Name"] = names(df)
我得到這個:
Candidate_Name Party State District Office Year Img_URL Last_Name
961 Heather Mizeur D Maryland 1 House 2022 https://images.ctfassets.net/00vgtve3ank7/3v1O... Mizeur
962 Heidi Campbell D Tennessee 5 House 2022 https://images.ctfassets.net/00vgtve3ank7/BbSQ... Campbell
963 Helen Brady R Massachusetts 9 House 2020 https://images.ctfassets.net/00vgtve3ank7/6WmS... Brady
964 Henry Cuellar D Texas 28 House 2022 https://images.ctfassets.net/00vgtve3ank7/4GGP... Cuellar
965 Henry Cuellar D Texas 28 House 2020 https://images.ctfassets.net/00vgtve3ank7/3xNd... Cuellar
966 Henry Cuellar D Texas 28 House 2018 https://images.ctfassets.net/00vgtve3ank7/uCK7... Cuellar
967 Henry Martin D Missouri 6 House 2022 https://images.ctfassets.net/00vgtve3ank7/5rfd... Martin
968 Henry Robert Martin D Missouri 6 House 2018 https://images.ctfassets.net/00vgtve3ank7/MvL8... Martin
969 Herb Jones D Virginia 1 House 2022 https://images.ctfassets.net/00vgtve3ank7/47Uy... Jones
970 Herman West Jr. R Georgia 2 House 2018 https://images.ctfassets.net/00vgtve3ank7/534y... Jr.
971 Hilary Turner D West Virginia 3 House 2020 https://images.ctfassets.net/00vgtve3ank7/3ZIN... Turner
972 Hillary O'Connor Mueri D Ohio 14 House 2020 https://images.ctfassets.net/00vgtve3ank7/5i5w... Mueri
973 Hillary Scholten D Michigan 3 House 2022 https://images.ctfassets.net/00vgtve3ank7/47KO... Scholten
974 Hillary Scholten D Michigan 3 House 2020 https://images.ctfassets.net/00vgtve3ank7/3g47... Scholten
975 Hiral Tipirneni D Arizona 8 House 2018 https://images.ctfassets.net/00vgtve3ank7/3e9V... Tipirneni
976 Hiral Tipirneni D Arizona 6 House 2020 https://images.ctfassets.net/00vgtve3ank7/1APF... Tipirneni
977 Holden Hoggatt R Louisiana 3 House 2022 https://images.ctfassets.net/00vgtve3ank7/4tQP... Hoggatt
978 Homer Markel D Illinois 12 House 2022 https://images.ctfassets.net/00vgtve3ank7/3XXY... Markel
979 Hosea Cleveland D South Carolina 3 House 2020 https://images.ctfassets.net/00vgtve3ank7/FZKi... Cleveland
980 Hung Cao R Virginia 10 House 2022 https://images.ctfassets.net/00vgtve3ank7/4Aql... Cao
981 Ian Todd D Minnesota 6 House 2018 https://images.ctfassets.net/00vgtve3ank7/3WkL... Todd
982 Ike McCorkle D Colorado 4 House 2022 https://images.ctfassets.net/00vgtve3ank7/d7UB... McCorkle
983 Ilhan Omar D Minnesota 5 House 2022 https://images.ctfassets.net/00vgtve3ank7/3TS6... Omar
984 Ilhan Omar D Minnesota 5 House 2020 https://images.ctfassets.net/00vgtve3ank7/4EDC... Omar
985 Ilhan Omar D Minnesota 5 House 2018 https://images.ctfassets.net/00vgtve3ank7/4n9U... Omar
986 Irene Armendariz-Jackson R Texas 16 House 2022 https://images.ctfassets.net/00vgtve3ank7/2nbG... Armendariz-Jackson
987 Irene Armendariz-Jackson R Texas 16 House 2020 https://images.ctfassets.net/00vgtve3ank7/6wKS... Armendariz-Jackson
988 Iro Omere D Texas 4 House 2022 https://images.ctfassets.net/00vgtve3ank7/gewL... Omere
989 Isaac McCorkle D Colorado 4 House 2020 https://images.ctfassets.net/00vgtve3ank7/4sUc... McCorkle
990 J. Michael Galbraith D Ohio 5 House 2018 https://images.ctfassets.net/00vgtve3ank7/27nQ... Galbraith
所以它顯然沒有忽略其中包含 Jr. 的元素......(例如參見第 970 行)。
我在這里做錯了什么?我已經嘗試過該函式的不同值str.get(),但它仍然不斷給我這個值。我也不想從另一個方向來做,因為有些人有中間名首字母或使用他們的中間名首字母(示例參見第 990 行)。那么什么在這里不起作用?為什么 if-else 陳述句沒有捕捉到它?
uj5u.com熱心網友回復:
也許您可以使用正則運算式來提取姓氏(不帶Jr.等Sr.):
df["Last_Name"] = df["Candidate_Name"].str.extract(r"([^\s] )\s*(?=Jr\.|Sr\.|$)")
print(df[["Candidate_Name", "Last_Name"]])
印刷:
961 Heather Mizeur Mizeur
962 Heidi Campbell Campbell
963 Helen Brady Brady
964 Henry Cuellar Cuellar
965 Henry Cuellar Cuellar
966 Henry Cuellar Cuellar
967 Henry Martin Martin
968 Henry Robert Martin Martin
969 Herb Jones Jones
970 Herman West Jr. West
971 Hilary Turner Turner
972 Hillary O'Connor Mueri Mueri
973 Hillary Scholten Scholten
974 Hillary Scholten Scholten
975 Hiral Tipirneni Tipirneni
976 Hiral Tipirneni Tipirneni
977 Holden Hoggatt Hoggatt
978 Homer Markel Markel
979 Hosea Cleveland Cleveland
980 Hung Cao Cao
981 Ian Todd Todd
982 Ike McCorkle McCorkle
983 Ilhan Omar Omar
984 Ilhan Omar Omar
985 Ilhan Omar Omar
986 Irene Armendariz-Jackson Armendariz-Jackson
987 Irene Armendariz-Jackson Armendariz-Jackson
988 Iro Omere Omere
989 Isaac McCorkle McCorkle
990 J. Michael Galbraith Galbraith
uj5u.com熱心網友回復:
您正在使用:
if df["Candidate_Name"].eq('Jr.').all()
哪個轉化為if df['Candidate_Name'] == 'Jr.'哪個不是這種情況。此外,使用.all()會導致另一種意外行為。您應該對其進行矢量化并使用in或contains()。考慮使用這個:
df["Last_Name"] = np.where(df['Candidate_Name'].str.contains('Jr.',case=False,regex=True)
df["Candidate_Name"].str.split().str[-2],
df['Candidate_Name'].str.split().str[-1])
這是一種更高效的基于您的資料回傳的預期輸出。
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/533924.html
標籤:Python熊猫细绳
