遍歷一個多索引的DataFrame并編輯某些行 -有解無憂

背景：

因此，我有一個多索引 df，其中包含用戶如何穿越我們網站的資料。df的內容如下：

df = df.loc[df['eventType'] == 5]
df.set_index(['id_1','id_2']).sort_values('timestamp') 。

示例df:

 eventType timestamp Step_A Step_B Step_C Step_D
id id_2

1 abc 5 t_1 Action_A NA NA NA
    abc 5 t_2 Action_A Action_B NA
    abc 5 t_3 Action_A Action_B Action_C NA
    abc 5 t_4 Action_A Action_B Action_C 退出網站


2 ijk 5 t_4 Action_A NA NA
    ijk 5 t_5 Action_A Action_B NA
    ijk 5 t_6 Action_A NA NA

3 . . . . . .
     .      . . . . . .

id=1是一個例子，某人 "完成 "了他們在網站上的路徑（行動A-C，然后在行動D離開網站）。然而，在某些情況下，用戶采取了更復雜的路徑，或者沒有正確地退出網站，Exit Website頁面沒有被記錄（例如id=2）。我正試圖通過按時間戳排序來填補這些數值，并確保每個人的會話在A-D列的最后一行都有 "退出網頁"。如果一個會話中沒有 "退出網頁"（會話是指id），我想把 "退出網頁 "放在第一個觀察到的 "Na "或 "None "列上，其余的保留。然而，如果有一個 "Exit Webpage"，我想讓它保持不變。

例子：

之前 -

 eventType timestamp Step_A Step_B Step_C Step_D
id id_2

1 abc 5 t_1 Action_A NA NA NA
    abc 5 t_2 Action_A Action_B NA
    abc 5 t_3 Action_A Action_B Action_C NA
    abc 5 t_4 Action_A Action_B Action_C 退出網站


2 ijk 5 t_4 Action_A NA NA
    ijk 5 t_5 Action_A Action_B NA
    ijk 5 t_6 Action_A NA NA

3 . . . . . .
     .      . . . . . .

After -

 eventType timestamp Step_A Step_B Step_C Step_D
id id_2

1 abc 5 t_1 Action_A    
    abc 5 t_2 Action_A Action_B
    abc 5 t_3 Action_A Action_B Action_C
    abc 5 t_4 Action_A Action_B Action_C 退出網站



2 ijk 5 t_5 Action_A NA NA
    ijk 5 t_6 Action_A Action_B NA
    ijk 5 t_7 Action_A NA NA
    ijk 5 t_8 Action_A 退出網站 NA NA 
 
3 . . . . . .
     .      . . . . . .

psuedo-code:

for最后一行 for每個`id` 在我的多索引df。
 
   if列A 或B 或C 或D有一個NA。

         df[Column] = 替換第一個NA 與 "Exit Webpage" 對于該行

   else:
         pass

uj5u.com熱心網友回復：

df.reset_index(inplace=True)
m1 = ~df.replicated(['id', 'eventType'], keep='last')
m2 = df.displicated(['id', 'eventType'], keep=False)
last_group_row = m1 & m2

step_a_empty = df.groupby(['id', 'eventType'])['Step_A'].transform（lambda X: x. isnull().all() and 'Exit Website'/span> not in x)
step_b_empty = df.groupby(['id', 'eventType'])['Step_B'].transform（lambdax: x. isnull().all() and 'Exit Website'/span> not in x)
step_c_empty = df.groupby(['id', 'eventType'])['Step_C'].transform（lambda x: x.isnull（）。 all() and 'Exit Website' not in x）#。 apply(lambda x: x.isnull().all())
step_d_empty = df.groupby(['id', 'eventType'])['Step_D'].transform(lambda x: x.isnull()。 all() and 'Exit Website' not in x）#。 apply(lambda x: x.isnull().all())

update_a = step_b_empty & step_c_empty & step_d_empty
update_a_last_row = last_group_row & update_a
df.loc[update_a_last_row, 'Step_A'] = '退出網站''id', 'eventType']) ['Step_A'].transform(lambda x: not 'Exit Website' in x.tolist() #& step_c_empty & step_d_empty
update_b_last_row = last_group_row & update_b & step_c_empty & step_d_empty
df.loc[update_b_last_row, 'Step_B'] = '退出網站''id', 'eventType'])[['Step_A', 'Step_B']] 。 transform(lambda x: not 'Exit Website' in x.tolist() #& step_c_empty & step_d_empty
update_c = update_c.all(axis='columns')
update_c_last_row = last_group_row & update_c & step_d_empty
df.loc[update_c_last_row, 'Step_C'] = '退出網站''id', 'eventType']) [['Step_A', 'Step_B'/span>, 'Step_C'/span>, 'Step_D'/span>]]。 transform(lambda x: not 'Exit Website' in x.tolist() #& step_c_empty & step_d_empty
update_d = update_d.all(axis='columns')
update_d_last_row = last_group_row & update_d
df.loc[update_d_last_row, 'Step_D'] = '退出網站''id', 'eventType'], inplace=True)

原始資料框架:

 timestamp Step_A Step_B Step_C Step_D
id eventType                                                      
1 5 t_1 Action_A NaN NaN NaN
   5 t_2 Action_A Action_B NaN NaN
   5 t_3 Action_A Action_B Action_C NaN
   5 t_4 Action_A Action_B Action_C Exit Website
2 5 t_5 Action_A NaN NaN NaN
   5 t_6 Action_A NaN NaN NaN
   5 t_7 NaN NaN NaN NaN
3 5 t_8 Action_A Action_B NaN NaN
   5 t_9 Action_A Action_B NaN NaN
   5 t_10 Action_A NaN NaN NaN
4 5 t_11 Action_A Action_B Action_C NaN
   5 t_12 Action_A Action_B Action_C NaN
   5 t_13 Action_A Action_B NaN NaN

最終資料框架：

 timestamp Step_A Step_B Step_C Step_D
id eventType                                                                  
1 5 t_1 Action_A NaN NaN NaN
   5 t_2 Action_A Action_B NaN NaN
   5 t_3 Action_A Action_B Action_C NaN
   5 t_4 Action_A Action_B Action_C Exit Website
2 5 t_5 Action_A NaN NaN NaN
   5 t_6 Action_A NaN NaN NaN
   5 t_7 退出網站 NaN NaN NaN
3 5 t_8 Action_A Action_B NaN NaN
   5 t_9 Action_A Action_B NaN NaN
   5 t_10 Action_A 退出網站 NaN NaN
4 5 t_11 Action_A Action_B Action_C NaN
   5 t_12 Action_A Action_B Action_C NaN
   5 t_13 Action_A Action_B Exit Website NaN

解決方案2：

df.reset_index(inplace=True)
dfs = []
step_cols = [col for col in list(df) if col.startswith('Step_') ]
print(step_cols)


for group_name, df_group in df.groupby(['id', 'eventType']) 。
    print('='*50，group_name, '='*50)
    if 'Exit Website' not in df_group.values。
       print(df_group)
       for col in reversed（step_cols）。
           if df_group[col].isnull(). all():
               print(f'Col: {col} is all null'/span>)
           else:
               print(f'Col: {col} contain values)
               df_group[col].iloc[-1] = 'Exit Website'。
               break

       dfs.append(df_group)

       print('--')
       print(df_group)
    else:
        dfs.append(df_group)

df_final = pd.concat(dfs, ignore_index=True)
print('================================================')
df_final.set_index(['id', 'eventType'], inplace=True)
print(df_final)

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/309191.html

標籤：

上一篇：Pandas-如何將型別和值列透視為每個型別的新列

下一篇：X-editablecombodate回傳"腳本錯誤"。