我有這個資料集:
emails1 = ['[email protected]', "[email protected]", "[email protected]"]
emails2 = ['[email protected]', "[email protected]", '[email protected]', "[email protected]"]
emails3 = ["[email protected]", '[email protected]']
terms = ['@gmail.com', 'data', 'ddd@']
df = pd.DataFrame([emails1, emails2, emails3])
df["emails"] = df.apply(lambda x: list([x[0],
x[1],
x[2],
x[3]]),axis=1)
df = df.iloc[: , 4:]
df
emails
0 [[email protected], [email protected], [email protected], None]
1 [[email protected], [email protected], [email protected], [email protected]]
2 [[email protected], [email protected], None, None]
我需要能夠從 terms 陣列中找到每個串列的第一項(從后面開始),所以我的輸出將是另一列:
emails email wanted
0 [[email protected], [email protected], [email protected], None] [[email protected]]
1 [[email protected], [email protected], [email protected], [email protected]] [[email protected]]
2 [[email protected], [email protected], None, None] [[email protected]]
我對每個術語都進行了嘗試并結合了結果,但不起作用:
df["emails"].apply(lambda x:[i for i in x if '@gmail.com' in i])
有沒有這樣做的好方法?
uj5u.com熱心網友回復:
確切的邏輯尚不清楚,但您需要一個串列理解:
import re
regex = re.compile('|'.join(map(re.escape, terms)))
# r'@gmail\.com|data|ddd@'
df['wanted'] = [next((x for x in l[::-1] if x and regex.search(x)), None)
for l in df['emails']]
輸出:
emails wanted
0 [[email protected], [email protected], [email protected]... [email protected]
1 [[email protected], [email protected], [email protected]... [email protected]
2 [[email protected], [email protected], None, None] [email protected]
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/518015.html
上一篇:如果另一列包含某個單詞,Pythonpandas如何更新值為1的列
下一篇:動態調整面板/框架/樹視圖的大小
