按特定列將不同行的列合并為一個行組-有解無憂

我有以下資料框

df1 = pd.DataFrame(
    {   
        "day":     ["monday", "monday","Tuesday" ],
        "column0": ["xx",      "xx",     ""],
        "column1": ["yy",      "aa",    "bb"],
        "column2": ["cc",      "cc",    "cc"],
        "column3": ["cc",      "",      "aa"]})


    day    column0  column1 column2 column3
0   monday  xx       yy       cc      cc
1   monday  xx       aa       cc    
2   Tuesday          bb       cc      aa

我想按天分組并加入行中的列，并將行保留為索引列

預期結果1：

df1 = pd.DataFrame(
    {   
        "day":     ["monday", "Tuesday" ],
        "index":   ["0,1",          "2" ],
        "column0": ["xx",             ""],
        "column1": ["yy",           "bb"],
        "column2": ["cc",           "cc"],
        "column3": ["cc",           "aa"],
        "column4": ["xx",             ""],
        "column5": ["aa",             ""],
        "column6": ["cc",             ""]})

    day   index column0 column1 column2 column3 column4 column5 column6
0   monday  0,1   xx       yy     cc      cc      xx      aa    cc
1   Tuesday 2              bb     cc      aa

最后，我想洗掉每一行的相同值并將 NAN 添加到空白列

最終結果：

df1 = pd.DataFrame(
    {   
        "day":     ["monday", "Tuesday" ],
        "index":   ["0,1",          "2" ],
        "column0": ["xx",          "NAN"],
        "column1": ["yy",           "bb"],
        "column2": ["cc",           "cc"],
        "column3": ["NAN",          "aa"],
        "column5": ["aa",          "NAN"]})

    day   index column0 column1 column2  column3    column4
0   monday  0,1   xx      yy          cc    NAN       aa
1   Tuesday 2    NAN      bb          cc    aa        NAN

有任何想法嗎？

uj5u.com熱心網友回復：

這并不完美，但確實有效。

    # Concatenate both DataFrames 
    df_merged = pd.concat([df1,df2],sort=False, axis=0)

        day     column0 column1 column2 column3 index   column4 column5 column6
    0   monday  xx      yy      cc      cc              
    1   monday  xx      aa      cc                  
    2   Tuesday         bb      cc      aa              
    0   monday  xx      yy      cc      cc      0,1      xx      aa     cc
    1   Tuesday         bb      cc      aa      2           

    # Drop rows with NaN in "Index" column
    df_merged.dropna(subset=['index'],inplace=True)

    day     column0 column1 column2 column3 index   column4 column5 column6
0   monday  xx      yy      cc      cc      0,1     xx      aa      cc
1   Tuesday         bb      cc      aa      2

uj5u.com熱心網友回復：

您可以使用 numpy 來展平分組的資料框。然后將它們存盤在一個串列中并從中制作一個資料框。

您最終可以替換""和None使用NaN，洗掉NaN列并重命名列：

import pandas as pd
import numpy as np

df1 = pd.DataFrame(
    {   
        "day":     ["monday", "monday","Tuesday" ],
        "column0": ["xx",      "xx",     ""],
        "column1": ["yy",      "aa",    "bb"],
        "column2": ["cc",      "cc",    "cc"],
        "column3": ["cc",      "",      "aa"]})

arr_list = []
for d, sub_df in df1.groupby("day"):
  arr = list(np.array(sub_df.iloc[:,1:]).flatten())
  arr = [d, list(sub_df.index)]   arr
  arr_list.append(arr)

df = pd.DataFrame(arr_list)
df = df.replace('',np.nan).fillna(value=np.nan).dropna(axis=1, how='all')
df.columns = ["day", "index"]   [f"column{i}" for i in range(len(df.columns)-2)]
print(df)

輸出：

       day   index column0 column1 column2 column3 column4 column5 column6
0  Tuesday     [2]     NaN      bb      cc      aa     NaN     NaN     NaN
1   monday  [0, 1]      xx      yy      cc      cc      xx      aa      cc

編輯：如果要洗掉每一行中的重復項，請在展平陣列后執行此操作：

for d, sub_df in df1.groupby("day"):
  arr = list(np.array(sub_df.iloc[:,1:]).flatten())
  # removing duplicates for this row:
  arr_unique = []
  for x in arr:
    if not x in arr_unique:
      arr_unique.append(x)
    else: # appending NaN to keep dataframe form
      arr_unique.append(np.nan)
  arr = [d, list(sub_df.index)]   arr_unique
  arr_list.append(arr)

輸出：

       day   index column0 column1 column2 column3 column4
0  Tuesday     [2]     NaN      bb      cc      aa     NaN
1   monday  [0, 1]      xx      yy      cc     NaN      aa

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/372441.html

標籤：Python 蟒蛇-3.x 熊猫数据框 pandas-groupby

上一篇：將整數串列按行寫入csv

下一篇：SQLite資料庫在兩列上選擇MAX(column)，同時從一列中選擇一個Distinct值