以小寫形式回傳資料框列中的所有單詞-有解無憂

我想將“拆分推文”列中的所有單詞轉換為小寫

這是我的代碼；

def word_splitter(df):
    
    df['Split Tweets'] = df['Tweets'].str.split()
    df['Split Tweets'] = df['Split Tweets'].str.lower()

    
    df = df[['Tweets', 'Date', 'Split Tweets']]
    
    return df

word_splitter(twitter_df.copy())

這是我得到的輸出；

    Tweets                                              Date                Split Tweets
0   @BongaDlulane Please send an email to mediades...   2019-11-29 12:50:54 NaN
1   @saucy_mamiie Pls log a call on 0860037566          2019-11-29 12:46:53 NaN
2   @BongaDlulane Query escalated to media desk.        2019-11-29 12:46:10 NaN
3   Before leaving the office this afternoon, head...   2019-11-29 12:33:36 NaN
4   #ESKOMFREESTATE #MEDIASTATEMENT : ESKOM SUSPEN...   2019-11-29 12:17:43 NaN
... ... ... ...
195 Eskom's Visitors Centres’ facilities include i...   2019-11-20 10:29:07 NaN
196 #Eskom connected 400 houses and in the process...   2019-11-20 10:25:20 NaN
197 @ArthurGodbeer Is the power restored as yet?        2019-11-20 10:07:59 NaN
198 @MuthambiPaulina @SABCNewsOnline @IOL @eNCA @e...   2019-11-20 10:07:41 NaN
199 RT @GP_DHS: The @GautengProvince made a commit...   2019-11-20 10:00:09 NaN

這是預期的輸出；

word_splitter(twitter_df.copy()) 
    Tweets                                              Date                Split Tweets
0   @BongaDlulane Please send an email to mediades...   2019-11-29 12:50:54 [@bongadlulane, please, send, an, email, to, m...
1   @saucy_mamiie Pls log a call on 0860037566          2019-11-29 12:46:53 [@saucy_mamiie, pls, log, a, call, on, 0860037...
2   @BongaDlulane Query escalated to media desk.        2019-11-29 12:46:10 [@bongadlulane, query, escalated, to, media, d...
3   Before leaving the office this afternoon, head...   2019-11-29 12:33:36 [before, leaving, the, office, this, afternoon...
4   #ESKOMFREESTATE #MEDIASTATEMENT : ESKOM SUSPEN...   2019-11-29 12:17:43 [#eskomfreestate, #mediastatement, :, eskom, s...
... ... ... ...
195 Eskom's Visitors Centres’ facilities include i...   2019-11-20 10:29:07 [eskom's, visitors, centres’, facilities, incl...
196 #Eskom connected 400 houses and in the process...   2019-11-20 10:25:20 [#eskom, connected, 400, houses, and, in, the,...
197 @ArthurGodbeer Is the power restored as yet?        2019-11-20 10:07:59 [@arthurgodbeer, is, the, power, restored, as,...
198 @MuthambiPaulina @SABCNewsOnline @IOL @eNCA @e...   2019-11-20 10:07:41 [@muthambipaulina, @sabcnewsonline, @iol, @enc...
199 RT @GP_DHS: The @GautengProvince made a commit...   2019-11-20 10:00:09 [rt, @gp_dhs:, the, @gautengprovince, made, a,...

請問我該怎么做？

uj5u.com熱心網友回復：

在拆分字串之前，您需要將Tweets字串轉換為小寫。改用這個：

df['Split Tweets'] = df['Tweets'].str.lower().str.split()

uj5u.com熱心網友回復：

請試試這個：

df['Split Tweets'] = df['Tweets'].apply(lambda x:x.lower().split())

uj5u.com熱心網友回復：

完成后str.split()，您的df['Split Tweets']列包含一個串列而不僅僅是一個字串，因此它無法執行該str.lower()方法。

您可以更改順序，就像此處建議的其他答案/評論一樣，或者您可以使用以下str.lower()方法通過 lambda 函式在串列中應用該map方法：

df['Split Tweets'] = df['Split Tweets'].map(lambda x: list(map(str.lower, x)))

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/445123.html

標籤：Python 熊猫数据框

上一篇：如果列僅包含特定單詞，則從pandas資料框中洗掉行

下一篇：使用字典映射將熊貓資料框的值替換為串列？