大家好。我有一個龐大的資料集,其中有幾個由 ISO 代碼表示的國家。無論如何,有些國家顯示有官方名稱,但沒有顯示 ISO 代碼。我想找到它們,然后用各自的 iso 代碼替換它們。
這是我的 df 示例:
| TERRITORY |
-----------------------
| IT, GB, USA, France |
| ES, Russia, Germany, PT |
| EG, LY, DZ |
預期輸出:
'The nations that were not converted are:' France, Russia, Germany
最大的問題是這些國家在同一個單元格中,并被視為一個單一的值。我想要求程式只列印超過兩個字符的子字串,但經過不同的嘗試,我什么也沒得到。
有人可以幫助我嗎?
uj5u.com熱心網友回復:
IIUC,您可以split explode并映射到已知的代碼串列(此處使用pycountry):
import pycountry
codes = {c.alpha_2 for c in pycountry.countries}
# or manually set
# codes = {'IT', 'GB', 'USA', 'FR'...}
s = df['TERRITORY'].str.split(', ').explode().drop_duplicates()
print(f'The nations that were not converted are: {", ".join(s[~s.isin(codes)])}')
輸出:
The nations that were not converted are: USA, France, Russia, Germany
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/447953.html
