我在 Pandas 資料框中有一列,我想仔細檢查這些列值是否作為字典中的鍵存在。目前,列值具有美國各州的縮寫。我找到了一本詞典,其中包含美國各州的所有縮寫及其全名。我想要做的是檢查列中的縮寫是否與字典中的鍵匹配,而不將其映射到它們的值。這將允許我檢查列中是否有任何代碼不存在。
這是字典:
{
"AL": "Alabama",
"AK": "Alaska",
"AS": "American Samoa",
"AZ": "Arizona",
"AR": "Arkansas",
"CA": "California",
"CO": "Colorado",
"CT": "Connecticut",
"DE": "Delaware",
"DC": "District Of Columbia",
"FM": "Federated States Of Micronesia",
"FL": "Florida",
"GA": "Georgia",
"GU": "Guam",
"HI": "Hawaii",
"ID": "Idaho",
"IL": "Illinois",
"IN": "Indiana",
"IA": "Iowa",
"KS": "Kansas",
"KY": "Kentucky",
"LA": "Louisiana",
"ME": "Maine",
"MH": "Marshall Islands",
"MD": "Maryland",
"MA": "Massachusetts",
"MI": "Michigan",
"MN": "Minnesota",
"MS": "Mississippi",
"MO": "Missouri",
"MT": "Montana",
"NE": "Nebraska",
"NV": "Nevada",
"NH": "New Hampshire",
"NJ": "New Jersey",
"NM": "New Mexico",
"NY": "New York",
"NC": "North Carolina",
"ND": "North Dakota",
"MP": "Northern Mariana Islands",
"OH": "Ohio",
"OK": "Oklahoma",
"OR": "Oregon",
"PW": "Palau",
"PA": "Pennsylvania",
"PR": "Puerto Rico",
"RI": "Rhode Island",
"SC": "South Carolina",
"SD": "South Dakota",
"TN": "Tennessee",
"TX": "Texas",
"UT": "Utah",
"VT": "Vermont",
"VI": "Virgin Islands",
"VA": "Virginia",
"WA": "Washington",
"WV": "West Virginia",
"WI": "Wisconsin",
"WY": "Wyoming"
}
該列僅包含州縮寫(CA、FL、AK、A??L 等)。干杯! 部分資料如下
,LOSS STATE
0,AL
1,CA
2,CO
3,DC
4,AZ
5,Nonsense
6,CA
7,PA
8,GA
9,VA
10,VA
11,VA
12,VA
13,TN
14,VA
15,CA
16,TX
17,CO
18,MO
19,CA
我希望保留所有具有“有效”狀態縮寫的列,但希望將“無意義”列更改為 NA,因為它沒有出現在字典中。
uj5u.com熱心網友回復:
您可以創建包含要測驗使用的縮寫的列的所有值的串列,df[col_name].tolist()這將生成如下串列
col_values_list = ['CA', 'FL', 'AK', 'AL','AU']
然后檢查字典的鍵中是否不存在這些值中的任何一個:
for col in col_list:
if col not in us_dict.keys():
print(f"{col} state abbreviation missing from dictionary key")
一種更簡單的方法,就是使用:
us_dict.keys() - df[col_name] # col name is your actual column name
這讓你再次得到你需要的東西。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/366371.html
