如何將excel資料決議為特定格式的資料框？-有解無憂

我有一個這樣的 excel，在 excel 的“元素”表中，如下所示：https : //i.stack.imgur.com/pT0PY.png

Item              Category  Value
TRANSPORT         A         1
Bus               A         2
Car               A         3
Automobile        A         4
Bike              A         5
ACCOMODATION      A         6
House             A         7
Apartment         A         8
DELIVERY          B         9
Glovo             B         10
Emag              B         11
Transporter       B         12
ACCOMODATION      B         13
Apartment1        B         14
Apartment2        B         15
ACCOMODATION      C         16
Rental            C         17
Apartment         C         18

我想將專案（運輸、住宿、交付）與元素（公共汽車、汽車、汽車、自行車...）分離成如下資料幀：

      Element          Item Category  Value
0         Bus     TRANSPORT        A      2
1         Car     TRANSPORT        A      3
2  Automobile     TRANSPORT        A      4
3        Bike     TRANSPORT        A      5
4       House  ACCOMODATION        A      7
5   Apartment  ACCOMODATION        A      8

我已經設法撰寫代碼將元素與類別 A 分開，但是當我將它用于類別 B 或 C 或其他類別時，代碼以某種方式中斷。它觸發：IndexError: index 8 is out of bounds for axis 0 with size 7因為代碼中的索引被修剪。我只需要為元素提取 Value 列的值，而不是為專案提取值，并且由于長度不匹配而中斷。我需要最終的資料框來包含 Excel 中所有類別的所有資訊，而不僅僅是一個類別。到目前為止我嘗試過的（僅適用于 A 類）：

import pandas as pd
import numpy as np

df = pd.read_excel('elements.xlsx',
                   ['Elements'], engine='openpyxl')

category_names = df['Elements']['Category'].unique()

df['Elements'] = df['Elements'].groupby(['Category'])

categ_group = ['TRANSPORT', 'ACCOMODATION', 'DELIVERY']


def create_category_df(category_name='A'):
    helper_df = df['Elements'].get_group(category_name)

    # get index for items
    item_index = helper_df[helper_df["Item"].isin(categ_group)].index.to_list()

    # get elements and associated items
    item_data = np.split(helper_df['Item'].to_numpy(), item_index)

    helper_df = helper_df.drop(helper_df.index[item_index])  # drop rows for items

    helper_df = helper_df.reset_index(drop=True)

    resulted_df = pd.DataFrame(columns=['Element', 'Item', 'Category', 'Value'])

    item_list = []
    for index in range(len(item_data)):
        if item_data[index].size != 0:
            resulted_df = resulted_df.append(pd.DataFrame(item_data[index][1:], columns=['Element']))
            item_list  = len(item_data[index][1:]) * [
                item_data[index][0]]  # multiply items by number of times it is present and add it to df

    resulted_df['Category'] = category_name  # 'Hardware EA'
    resulted_df['Item'] = item_list
    resulted_df['Value'] = helper_df['Value'].values
    resulted_df = resulted_df.reset_index(drop=True)
    print(resulted_df.to_string())
    return resulted_df


create_category_df()

uj5u.com熱心網友回復：

首先替換列名，然后用NaNs in重新替換 list 的不匹配值Series.where，因此可能會向前填充DataFrame.insert第二個新列中的缺失值，如果中的兩列中的值相等，則最后洗掉行boolean indexing：

categ_group = ['TRANSPORT', 'ACCOMODATION', 'DELIVERY']

df = df.rename(columns={'Item':'Element'})
df.insert(1, 'Item', df['Element'].where(df['Element'].isin(categ_group)).ffill())
df =df[ df['Element'].ne(df['Item'])]

print (df)
        Element          Item Category  Value
1           Bus     TRANSPORT        A      2
2           Car     TRANSPORT        A      3
3    Automobile     TRANSPORT        A      4
4          Bike     TRANSPORT        A      5
6         House  ACCOMODATION        A      7
7     Apartment  ACCOMODATION        A      8
9         Glovo      DELIVERY        B     10
10         Emag      DELIVERY        B     11
11  Transporter      DELIVERY        B     12
13   Apartment1  ACCOMODATION        B     14
14   Apartment2  ACCOMODATION        B     15
16       Rental  ACCOMODATION        C     17
17    Apartment  ACCOMODATION        C     18

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/319090.html

標籤：Python 擅长熊猫数据框通过...分组

上一篇：自動比較按列分組的行并突出顯示Excel中的不同單元格

下一篇：檔案格式或檔案擴展名無效