我有以下資料框
df1 = pd.DataFrame(
{
"day": ["monday", "monday","Tuesday" ],
"column0": ["xx", "xx", ""],
"column1": ["yy", "aa", "bb"],
"column2": ["cc", "cc", "cc"],
"column3": ["cc", "", "aa"]})
day column0 column1 column2 column3
0 monday xx yy cc cc
1 monday xx aa cc
2 Tuesday bb cc aa
我想按天分組并加入行中的列,并將行保留為索引列
預期結果1:
df1 = pd.DataFrame(
{
"day": ["monday", "Tuesday" ],
"index": ["0,1", "2" ],
"column0": ["xx", ""],
"column1": ["yy", "bb"],
"column2": ["cc", "cc"],
"column3": ["cc", "aa"],
"column4": ["xx", ""],
"column5": ["aa", ""],
"column6": ["cc", ""]})
day index column0 column1 column2 column3 column4 column5 column6
0 monday 0,1 xx yy cc cc xx aa cc
1 Tuesday 2 bb cc aa
最后,我想洗掉每一行的相同值并將 NAN 添加到空白列
最終結果:
df1 = pd.DataFrame(
{
"day": ["monday", "Tuesday" ],
"index": ["0,1", "2" ],
"column0": ["xx", "NAN"],
"column1": ["yy", "bb"],
"column2": ["cc", "cc"],
"column3": ["NAN", "aa"],
"column5": ["aa", "NAN"]})
day index column0 column1 column2 column3 column4
0 monday 0,1 xx yy cc NAN aa
1 Tuesday 2 NAN bb cc aa NAN
有任何想法嗎?
uj5u.com熱心網友回復:
這并不完美,但確實有效。
# Concatenate both DataFrames
df_merged = pd.concat([df1,df2],sort=False, axis=0)
day column0 column1 column2 column3 index column4 column5 column6
0 monday xx yy cc cc
1 monday xx aa cc
2 Tuesday bb cc aa
0 monday xx yy cc cc 0,1 xx aa cc
1 Tuesday bb cc aa 2
# Drop rows with NaN in "Index" column
df_merged.dropna(subset=['index'],inplace=True)
day column0 column1 column2 column3 index column4 column5 column6
0 monday xx yy cc cc 0,1 xx aa cc
1 Tuesday bb cc aa 2
uj5u.com熱心網友回復:
您可以使用 numpy 來展平分組的資料框。然后將它們存盤在一個串列中并從中制作一個資料框。
您最終可以替換""和None使用NaN,洗掉NaN列并重命名列:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
{
"day": ["monday", "monday","Tuesday" ],
"column0": ["xx", "xx", ""],
"column1": ["yy", "aa", "bb"],
"column2": ["cc", "cc", "cc"],
"column3": ["cc", "", "aa"]})
arr_list = []
for d, sub_df in df1.groupby("day"):
arr = list(np.array(sub_df.iloc[:,1:]).flatten())
arr = [d, list(sub_df.index)] arr
arr_list.append(arr)
df = pd.DataFrame(arr_list)
df = df.replace('',np.nan).fillna(value=np.nan).dropna(axis=1, how='all')
df.columns = ["day", "index"] [f"column{i}" for i in range(len(df.columns)-2)]
print(df)
輸出:
day index column0 column1 column2 column3 column4 column5 column6
0 Tuesday [2] NaN bb cc aa NaN NaN NaN
1 monday [0, 1] xx yy cc cc xx aa cc
編輯:如果要洗掉每一行中的重復項,請在展平陣列后執行此操作:
for d, sub_df in df1.groupby("day"):
arr = list(np.array(sub_df.iloc[:,1:]).flatten())
# removing duplicates for this row:
arr_unique = []
for x in arr:
if not x in arr_unique:
arr_unique.append(x)
else: # appending NaN to keep dataframe form
arr_unique.append(np.nan)
arr = [d, list(sub_df.index)] arr_unique
arr_list.append(arr)
輸出:
day index column0 column1 column2 column3 column4
0 Tuesday [2] NaN bb cc aa NaN
1 monday [0, 1] xx yy cc NaN aa
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/372441.html
標籤:Python 蟒蛇-3.x 熊猫 数据框 pandas-groupby
上一篇:將整數串列按行寫入csv
