我有一個df1如下所示的資料框 -
|email_id| date |
|[email protected] | ['2022-04-09'] |
|[email protected] | [nan]
|def@gmail.com | ['2022-09-21','2022-03-09'] |
|[email protected] | [nan, '2022-03-29'] |
|[email protected] | [nan] |
|[email protected] | [nan,'2022-09-01']
另一個df df2-
|email_id| status |
|[email protected] | 0 |
|def@gmail.com | 0 |
|[email protected] | 0 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
如何從 df1 中的 df2 查找 email_id 并更新 df2 中的狀態?如果我們在 df1 日期列中存在日期值,則該 email_id 的狀態應為 0,如果存在任何 nan 值,則狀態應為 1。如果 df2 中的某些 email_id 在 df1 中不匹配,將保留狀態一樣。
df2 的預期輸出 -
|email_id| status |
|[email protected] | 1 |
|def@gmail.com | 0 |
|[email protected] | 1 |
|[email protected] | 3 |
|[email protected] | 2 |
|[email protected] | 1 |
請幫幫我。提前致謝!
uj5u.com熱心網友回復:
首先DataFrame.explode用于串列中的列,然后使用聚合max為映射系列創建缺失值的比較,使用Series.map將不匹配的值替換為原始列df2['status']:
df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
[email protected] 1
def@gmail.com 0
[email protected] 1
[email protected] 1
[email protected] 1
Name: date, dtype: int32
df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
email_id status
0 [email protected] 1
1 def@gmail.com 0
2 [email protected] 1
3 [email protected] 3
4 [email protected] 2
5 [email protected] 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/526499.html
下一篇:如何拆分抓取的文本并創建資料框?
