我有一個將列名映射到函式名的字典。我寫了一個函式,應該將 df 列中的值大寫str.title()
import pandas as pd
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
Communication_Language__c firstName lastName state country company email industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0 English john smith ohio united states manufacturing National Residental
def capitalize (column,df_temp):
if df_temp[column].notna():
df_temp[column]=df[column].str.title()
return df_temp
def required ():
#somethin
Pass
parsing_map={
"firstName":[capitalize,required],
"lastName":capitalize,
"state":capitalize,
"country": [capitalize,required],
"industry":capitalize,
"System_Type__c":capitalize,
"AccountType":capitalize,
"customerSegment":capitalize,
}
我寫了下面來實作 str 標題,但是有沒有辦法將它應用到 df 列而不用全部命名
def capitalize (column,df_temp):
if df_temp[column].notna():
df_temp[column]=df[column].str.title()
return df_temp
參考字典函式映射以應用于str.title()具有“大寫”函式的列中的所有內容的最佳方法是什么?
期望的輸出
data= [["English","John","Smith","Ohio","United States","","","Manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
Communication_Language__c firstName lastName state country company email industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0 English John Smith Ohio United States Manufacturing National Residental
uj5u.com熱心網友回復:
通常你會為此使用 apply ,例如
cols_to_capitalize = list(parsing_map.keys())
df[cols_to_capitalize] = df[cols_to_capitalize].apply(lambda x: x.str.title())
如果你想保留你的方法字典,我建議你撰寫方法來作用于列,而不是資料框。像這樣的東西:
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])
def capitalize(col):
# TODO handle nan values
# Maybe use any() instead of all()?
# This code ignores any column that has even a single NaN value
if col.notna().all():
return col.str.title()
return col
def required(col):
# TODO do stuff
return col
parsing_map={
"firstName":[capitalize,required],
"lastName":[capitalize],
"state":[capitalize],
"country": [capitalize,required],
"industry":[capitalize],
"System_Type__c":[capitalize],
"AccountType":[capitalize],
"customerSegment":[capitalize],
}
for col_name, fns in parsing_map.items():
for fn in fns:
df[col_name] = fn(df[col_name])
如果它們需要訪問其他列,您也可以將完整的 df 傳遞給這些方法,但仍然只回傳單個列會使設計更清晰。
但是您應該仔細考慮是否真的需要重新發明.apply功能。
uj5u.com熱心網友回復:
建議:創建要包含的列串列,然后使用 apply
cols = ['firstName', 'lastName', 'state', 'country', 'industry', 'System_Type__c', 'AccountType', 'customerSegment']
df.apply(lambda col: col.replace(np.NaN, "").str.title() if col.name in cols else col)
編輯:是的,但是在 parsing_map 中放置一個字串而不是對函式的參考
parsing_map = {
"firstName": "capitalize",
"lastName": "capitalize",
"state": "capitalize",
"country": "capitalize",
"industry": "capitalize",
"System_Type__c": "capitalize",
"AccountType": "capitalize",
"customerSegment": "capitalize",
}
df.apply(lambda col: col.replace(np.NaN, "").str.title() if parsing_map.get(col.name) == "capitalize" else col)
如果您使用帶有串列作為值的 dict
df.apply(lambda col: col.replace(np.NaN, "").str.title() if "capitalize" in parsing_map.get(col.name) else col)
uj5u.com熱心網友回復:
def capitalize(df):
for col in df.columns:
df[col] = df[col].str.title()
return df
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/519493.html
標籤:Python熊猫数据框
