將str標題應用于字典值中的df列值-有解無憂

我有一個將列名映射到函式名的字典。我寫了一個函式，應該將 df 列中的值大寫str.title()

import pandas as pd
 
data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

  Communication_Language__c firstName lastName state        country company email       industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0                   English      john    smith  ohio  united states                manufacturing       National  Residental

def capitalize (column,df_temp):
    if df_temp[column].notna():
        df_temp[column]=df[column].str.title()
    return df_temp

def required ():
    #somethin
    Pass

parsing_map={
"firstName":[capitalize,required],
"lastName":capitalize,
"state":capitalize,
"country": [capitalize,required],
"industry":capitalize,
"System_Type__c":capitalize,
"AccountType":capitalize,
"customerSegment":capitalize,
}

我寫了下面來實作 str 標題，但是有沒有辦法將它應用到 df 列而不用全部命名

def capitalize (column,df_temp):
    if df_temp[column].notna():
        df_temp[column]=df[column].str.title()
    return df_temp

參考字典函式映射以應用于str.title()具有“大寫”函式的列中的所有內容的最佳方法是什么？

期望的輸出

data= [["English","John","Smith","Ohio","United States","","","Manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

  Communication_Language__c firstName lastName state        country company email       industry System_Type__c AccountType customerSegment Existing_Customer__c GDPR_Email_Permission__c
0                   English      John    Smith  Ohio  United States                Manufacturing       National  Residental

uj5u.com熱心網友回復：

通常你會為此使用 apply ，例如

cols_to_capitalize = list(parsing_map.keys())
df[cols_to_capitalize] = df[cols_to_capitalize].apply(lambda x: x.str.title())

如果你想保留你的方法字典，我建議你撰寫方法來作用于列，而不是資料框。像這樣的東西：

data= [["English","john","smith","ohio","united states","","","manufacturing","National","Residental","","",""]]
df= pd.DataFrame(data,columns=['Communication_Language__c','firstName', 'lastName', 'state', 'country', 'company', 'email', 'industry', 'System_Type__c', 'AccountType', 'customerSegment', 'Existing_Customer__c', 'GDPR_Email_Permission__c'])

def capitalize(col):
    # TODO handle nan values
    # Maybe use any() instead of all()?
    # This code ignores any column that has even a single NaN value
    if col.notna().all():
        return col.str.title()
    return col

def required(col):
    # TODO do stuff
    return col

parsing_map={
    "firstName":[capitalize,required],
    "lastName":[capitalize],
    "state":[capitalize],
    "country": [capitalize,required],
    "industry":[capitalize],
    "System_Type__c":[capitalize],
    "AccountType":[capitalize],
    "customerSegment":[capitalize],
}


for col_name, fns in parsing_map.items():
    for fn in fns:
        df[col_name] = fn(df[col_name])

如果它們需要訪問其他列，您也可以將完整的 df 傳遞給這些方法，但仍然只回傳單個列會使設計更清晰。

但是您應該仔細考慮是否真的需要重新發明.apply功能。

uj5u.com熱心網友回復：

建議：創建要包含的列串列，然后使用 apply

cols = ['firstName', 'lastName', 'state', 'country', 'industry', 'System_Type__c', 'AccountType', 'customerSegment']
df.apply(lambda col: col.replace(np.NaN, "").str.title() if col.name in cols else col)

編輯：是的，但是在 parsing_map 中放置一個字串而不是對函式的參考

parsing_map = {
    "firstName": "capitalize",
    "lastName": "capitalize",
    "state": "capitalize",
    "country": "capitalize",
    "industry": "capitalize",
    "System_Type__c": "capitalize",
    "AccountType": "capitalize",
    "customerSegment": "capitalize",
}

df.apply(lambda col: col.replace(np.NaN, "").str.title() if parsing_map.get(col.name) == "capitalize" else col)

如果您使用帶有串列作為值的 dict

df.apply(lambda col: col.replace(np.NaN, "").str.title() if "capitalize" in parsing_map.get(col.name) else col)

uj5u.com熱心網友回復：

def capitalize(df):
    for col in df.columns:
        df[col] = df[col].str.title()
    return df

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/519493.html

標籤：Python熊猫数据框

上一篇：在Python中加載壓縮的狒狒互動資料

下一篇：如何使用條件獲取熊貓資料框中的出現次數