DATAFRAME連接和分割-有解無憂

我有一個帶有 3 COLUMN ABC 的資料框 dF

dF =       
           
               A                                      B                  C
        navigate to "www.xyz.com"               to "www.xyz.com"        NA
     enters valid username "JOHN"                enters                "JOHN"
    enters password "1234567"                    enters                "1234567"
    enters  RIGHT destination"YUL"                enters               "YUL"
    clicks Customer Service                      clicks                 NA
    clicks Booking Information from Booking      clicks                 NA

我想找出 A、B C 之間的差異，其余值將在 D 列中。我希望我的資料框看起來像這樣

dF =       
        
               A                                      B                     C                 D
        navigate to "www.xyz.com"               to "www.xyz.com"        NA              navigate
     enters valid username "JOHN"                enters                "JOHN"           valid username
    enters valid password "1234567"               enters              "1234567"         valid password 
    enters  RIGHT destination"YUL"                enters               "YUL"            RIGHT destination
    clicks Customer Service                      clicks                 NA              Customer Service
    clicks Booking Information from Booking      clicks                 NA              Booking Information from Booking

我在用：

df['D'] = Final_df[['B', 'C']].agg(' '.join, axis=1).str.split(' ') 

df['D'] = df.apply(lambda x: ''.join(set(x['A'].split(' ')) - set(x['D'])), axis=1)

但我沒有在 D 列中按順序排列。

uj5u.com熱心網友回復：

df = {'A': ['navigate to "www.xyz.com"',
  'enters valid username "JOHN"',
  'enters password "1234567"',
  'enters  RIGHT destination"YUL"',
  'clicks Customer Service',
  'clicks Booking Information from Booking'],
 'B': ['to "www.xyz.com"', 'enters', 'enters', 'enters', 'clicks', 'clicks'],
 'C': ['NA', '"JOHN"', '"1234567"', '"YUL"', 'NA', 'NA']}

如果您確定所有單詞都是用空格分隔的（在第 4 行中不是這樣），那么您可以使用 split，但不要將 'A' 轉換為集合以保留順序。

a = df['A'].str.split()
b = df['B'].str.split().apply(set)
c = df['C'].str.split().apply(set)

df['D'] = [' '.join([a2 for a2 in a1 if a2 not in (b1 | c1)]) for a1, b1, c1 in zip(a,b,c)]

否則你可以考慮replace

df['D'] = df.apply(lambda r: r['A'].replace(r['B'], '').replace(r['C'], '').strip(), axis=1)

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/428243.html

標籤：Python 数据框加入

上一篇：通過加入僅選擇沒有未來日期的記錄

下一篇：Python：在近似鍵匹配上加入/合并2個dfs