如何通過與包含國家/地區名稱的字串串列進行比較,從資料框列中獲取國家/地區名稱。例如
list = ["pakistan","united kingdom","uk","usa","united states","uae"]
# create dataframe column name is job_location of employee
df = pd.DataFrame({
'job_location' : ['birmingham, england, united kingdom','new jersey, united states','gilgit-baltistan, pakistan','uae','united states','pakistan','31-c2, gulberg 3, lahore, pakistan'],
})
df
job_location
0 birmingham, england, united kingdom
1 new jersey, united states
2 gilgit-baltistan, pakistan
3 uae
4 united states
5 pakistan
6 31-c2, gulberg 3, lahore, pakistan
我需要資料框名稱中的新列作為國家/地區,其中包含來自 job_location 列的國家/地區名稱。
uj5u.com熱心網友回復:
使用clist作為串列名稱,您可以制作一個正則運算式并使用str.extract:
reg = '(%s)' % '|'.join(clist)
df['country'] = df['job_location'].str.extract(reg)
輸出:
job_location country
0 birmingham, england, united kingdom united kingdom
1 new jersey, united states united states
2 gilgit-baltistan, pakistan pakistan
3 uae uae
4 united states united states
5 pakistan pakistan
6 31-c2, gulberg 3, lahore, pakistan pakistan
但老實說,如果 job_location 總是以國家/地區作為結尾的格式很好,那么用逗號分隔并保留最后一個欄位可能更容易
uj5u.com熱心網友回復:
不假設國家總是在最后,這里有一些應該起作用的東西:
import pandas as pd
country_list = ["pakistan","united kingdom","uk","usa","united states","uae"]
# create dataframe column name is job_location of employee
df = pd.DataFrame({
'job_location' : ['birmingham, england, united kingdom','new jersey, united states','gilgit-baltistan, pakistan','uae','united states','pakistan','31-c2, gulberg 3, lahore, pakistan'],
})
matching_countries = []
for key, value in df.items():
for text in value:
for country in country_list:
if country in text:
matching_countries.append(country)
df['country'] = matching_countries
print (df)
輸出:
job_location country
0 birmingham, england, united kingdom united kingdom
1 new jersey, united states united states
2 gilgit-baltistan, pakistan pakistan
3 uae uae
4 united states united states
5 pakistan pakistan
6 31-c2, gulberg 3, lahore, pakistan pakistan
uj5u.com熱心網友回復:
首先,更改您的串列名稱。我已經使用串列理解完成了..
df['country'] = [x.split(",")[-1] for x in df['job_location']]
輸出:
| 作業地點 | 國家 | |
|---|---|---|
| 0 | 伯明翰,英國,英國 | 英國 |
| 1 | 美國新澤西州 | 美國 |
| 2 | 吉爾吉特-巴爾蒂斯坦,巴基斯坦 | 巴基斯坦 |
| 3 | 阿聯酋 | 阿聯酋 |
| 4 | 美國 | 美國 |
| 5 | 巴基斯坦 | 巴基斯坦 |
| 6 | 31-c2, gulberg 3, 拉合爾, 巴基斯坦 | 巴基斯坦 |
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/376332.html
