我有一本包含正確值的字典,我希望將其與資料框中的值進行交叉檢查。我希望此操作包含在稍后與其他代碼一起使用的函式中。
import pandas as pd
d=[['Aland Islands','Cars','[email protected]']]
df=pd.DataFrame(d,columns=['country','industry','Email'])
errors={}
valid_dict={"country": ["Afghanistan", "Aland Islands"],"industry": ["Automotive", "Banking / Finance"]}
valid_dict={k:v for k, v in valid_dict.items() if k in df.columns.values}
這只是檢查以確保 valid_dict 中的所有鍵和項都是資料框中的列名。這可以按預期作業,此處無需更改,只需添加背景關系即可。
這是代碼的問題子項。我試圖創建一個函式,但我對創建函式很陌生。我想將 valid_dict 鍵和項與資料框中的列名和值進行比較并列印一個簡單的陳述句
def validate(df, valid_dict):
{i:k for k, v in valid_dict.items() for i in v}
for c in valid_dict:
if df[c] in list(c):
return
else:
for c in valid_dict:
for i in df.index:
errors={ "row": i,
"column": c,
"message": "This is an invalid entry, fill in " c " accordingly" }
return errors,df
print(validate(df, valid_dict))
我知道這段代碼一團糟我嘗試了各種不同的東西,但我無法得到我想要的結果。
所需的輸出是:
errors={ "row": 0, column": industry, "message": "This is an invalid entry, fill in " industry " accordingly" }
如何交叉檢查字典到資料框以識別字典中的專案集串列中未找到的值?
對于詢問一列是否有 10 個值和 5 個錯誤的場景,我希望它列印所有 5 個錯誤。
uj5u.com熱心網友回復:
# inver the dictionary
d={i:k for k, v in valid_dict.items() for i in v}
# map industry and when its null, return an error message
# else the valid industry name
df['check']=df['industry'].mask(df['industry'].map(d).isna(), f"An invalid Value found in {col}")
df
country industry Email check
0 Aland Islands Cars [email protected] An invalid Value found in industry
功能:
def validate(col='industry', d=valid_dict):
# column to validate
# dictionary
d={i:k for k, v in valid_dict.items() for i in v}
# map column to dictionary
s=df[col].mask(df[col].map(d).isna(), "An invalid Value found in industry")
# return the rows where the mapping had failed
return s[s.map(d).isna()]
validate('industry')
0 An invalid Value found in industry
2 An invalid Value found in industry
Name: industry, dtype: object
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/526862.html
標籤:Python熊猫字典
