我有一個帶有 3 COLUMN ABC 的資料框 dF
dF =
A B C
navigate to "www.xyz.com" to "www.xyz.com" NA
enters valid username "JOHN" enters "JOHN"
enters password "1234567" enters "1234567"
enters RIGHT destination"YUL" enters "YUL"
clicks Customer Service clicks NA
clicks Booking Information from Booking clicks NA
我想找出 A、B C 之間的差異,其余值將在 D 列中。我希望我的資料框看起來像這樣
dF =
A B C D
navigate to "www.xyz.com" to "www.xyz.com" NA navigate
enters valid username "JOHN" enters "JOHN" valid username
enters valid password "1234567" enters "1234567" valid password
enters RIGHT destination"YUL" enters "YUL" RIGHT destination
clicks Customer Service clicks NA Customer Service
clicks Booking Information from Booking clicks NA Booking Information from Booking
我在用:
df['D'] = Final_df[['B', 'C']].agg(' '.join, axis=1).str.split(' ')
df['D'] = df.apply(lambda x: ''.join(set(x['A'].split(' ')) - set(x['D'])), axis=1)
但我沒有在 D 列中按順序排列。
uj5u.com熱心網友回復:
df = {'A': ['navigate to "www.xyz.com"',
'enters valid username "JOHN"',
'enters password "1234567"',
'enters RIGHT destination"YUL"',
'clicks Customer Service',
'clicks Booking Information from Booking'],
'B': ['to "www.xyz.com"', 'enters', 'enters', 'enters', 'clicks', 'clicks'],
'C': ['NA', '"JOHN"', '"1234567"', '"YUL"', 'NA', 'NA']}
如果您確定所有單詞都是用空格分隔的(在第 4 行中不是這樣),那么您可以使用 split,但不要將 'A' 轉換為集合以保留順序。
a = df['A'].str.split()
b = df['B'].str.split().apply(set)
c = df['C'].str.split().apply(set)
df['D'] = [' '.join([a2 for a2 in a1 if a2 not in (b1 | c1)]) for a1, b1, c1 in zip(a,b,c)]
否則你可以考慮replace
df['D'] = df.apply(lambda r: r['A'].replace(r['B'], '').replace(r['C'], '').strip(), axis=1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/428243.html
上一篇:通過加入僅選擇沒有未來日期的記錄
