處理堆疊的合并表-有解無憂

我通過 Pandas read_table 匯入了一個 csv，它本質上是一個堆疊列，其中每個學生都被命名，然后在以下行中找到值。

學生-約翰
2021 年 1 月 1 日	334
2021 年 1 月 2 日	456
學生莎莉
2021 年 1 月 1 日	76
2021 年 1 月 4 日	789

我想調整這些，以便每個學生都有自己的專欄，日期在左側。

日期	學生喬恩	學生莎莉
2021 年 1 月 1 日	334	76
2021 年 1 月 2 日	456
2021 年 1 月 4 日		789

我的方法是通過 pandas 資料框引入 CSV。

import pandas as pd
df = pd.read_table('C:/Users/****data.csv', skiprows=1, header=None)
df[2]=""
df.columns = "Date", "Val"

x="Start"

#Started with this although the Student line doesn't work

for ind, row in df.iterrows():
    if df['Date'][ind] == "Student*":
        x = df['Date'][ind]
        df.drop(ind, inplace=True)
    else:
        df['Val'][ind] = x

uj5u.com熱心網友回復：

使用布爾掩碼過濾掉你的資料框，然后pivot重塑它：

# Rename columns
df.columns = ['Date', 'Value']

# Find Student rows
m = df[0].str.startswith('Student')

# Create the future column
df['Student'] = df[0].mask(~m).ffill()

# Remove Student rows
df = df[~m]

# Reshape your dataframe
df = df.pivot('Date', 'Student', 'Value').rename_axis(columns=None).reset_index()

輸出：

>>> df
         Date Student-John Student-Sally
0  01/01/2021          334            76
1  01/02/2021          456           NaN
2  01/04/2021          NaN           789

設定：

import pandas as pd
import numpy as np

data = {0: ['Student-John', '01/01/2021', '01/02/2021',
            'Student-Sally', '01/01/2021', '01/04/2021'],
        1: [np.nan, '334', '456', np.nan, '76', '789']}
df = pd.DataFrame(data)
print(df)

# Output
               0    1
0   Student-John  NaN
1     01/01/2021  334
2     01/02/2021  456
3  Student-Sally  NaN
4     01/01/2021   76
5     01/04/2021  789

uj5u.com熱心網友回復：

一個樸素的解決方案將如下所示：

import pandas as pd

details = {
    'Date' : ['Student-John','01/01/2021','01/02/2021','Student-Sally','01/01/2021','01/02/2021'],
    'val' : ['', 'Y', 'N', '', 'N', 'N'],
}

df = pd.DataFrame(details)
print(df)

            Date val
0   Student-John    
1     01/01/2021   Y
2     01/02/2021   N
3  Student-Sally    
4     01/01/2021   N
5     01/02/2021   N

# Creating a new data frame - df1
# adding new dates to dates list and new column names
# to columns list, temp list contains Y/N values
# Adding dates and Y/N values to dataframe whenever
# we find a new column name

det = {}
df1 = pd.DataFrame(det)
temp_list = []
date_list = []
col_name = 'empty'
for ind in df.index:
  if df['val'][ind] == '':
    df1[col_name] = temp_list
    df1['Date'] = date_list
    temp_list = []
    date_list = []
    col_name = df['Date'][ind]
  else:
    temp_list.append(df['val'][ind])
    date_list.append(df['Date'][ind])
    if ind == len(df)-1:
       df1[col_name] = temp_list

del df1['empty']
print(df1)

         Date Student-John Student-Sally
0  01/01/2021            Y             N
1  01/02/2021            N             N

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/438699.html

標籤：python-3.x 熊猫数据框堆叠的

上一篇：ValueError：無法將<.....><....>轉換為Excel

下一篇：為什么word_index的長度大于num_words？