我有一個這樣的 excel,在 excel 的“元素”表中,如下所示:https : //i.stack.imgur.com/pT0PY.png
Item Category Value
TRANSPORT A 1
Bus A 2
Car A 3
Automobile A 4
Bike A 5
ACCOMODATION A 6
House A 7
Apartment A 8
DELIVERY B 9
Glovo B 10
Emag B 11
Transporter B 12
ACCOMODATION B 13
Apartment1 B 14
Apartment2 B 15
ACCOMODATION C 16
Rental C 17
Apartment C 18
我想將專案(運輸、住宿、交付)與元素(公共汽車、汽車、汽車、自行車...)分離成如下資料幀:
Element Item Category Value
0 Bus TRANSPORT A 2
1 Car TRANSPORT A 3
2 Automobile TRANSPORT A 4
3 Bike TRANSPORT A 5
4 House ACCOMODATION A 7
5 Apartment ACCOMODATION A 8
我已經設法撰寫代碼將元素與類別 A 分開,但是當我將它用于類別 B 或 C 或其他類別時,代碼以某種方式中斷。它觸發:IndexError: index 8 is out of bounds for axis 0 with size 7因為代碼中的索引被修剪。我只需要為元素提取 Value 列的值,而不是為專案提取值,并且由于長度不匹配而中斷。我需要最終的資料框來包含 Excel 中所有類別的所有資訊,而不僅僅是一個類別。到目前為止我嘗試過的(僅適用于 A 類):
import pandas as pd
import numpy as np
df = pd.read_excel('elements.xlsx',
['Elements'], engine='openpyxl')
category_names = df['Elements']['Category'].unique()
df['Elements'] = df['Elements'].groupby(['Category'])
categ_group = ['TRANSPORT', 'ACCOMODATION', 'DELIVERY']
def create_category_df(category_name='A'):
helper_df = df['Elements'].get_group(category_name)
# get index for items
item_index = helper_df[helper_df["Item"].isin(categ_group)].index.to_list()
# get elements and associated items
item_data = np.split(helper_df['Item'].to_numpy(), item_index)
helper_df = helper_df.drop(helper_df.index[item_index]) # drop rows for items
helper_df = helper_df.reset_index(drop=True)
resulted_df = pd.DataFrame(columns=['Element', 'Item', 'Category', 'Value'])
item_list = []
for index in range(len(item_data)):
if item_data[index].size != 0:
resulted_df = resulted_df.append(pd.DataFrame(item_data[index][1:], columns=['Element']))
item_list = len(item_data[index][1:]) * [
item_data[index][0]] # multiply items by number of times it is present and add it to df
resulted_df['Category'] = category_name # 'Hardware EA'
resulted_df['Item'] = item_list
resulted_df['Value'] = helper_df['Value'].values
resulted_df = resulted_df.reset_index(drop=True)
print(resulted_df.to_string())
return resulted_df
create_category_df()
uj5u.com熱心網友回復:
首先替換列名,然后用NaNs in重新替換 list 的不匹配值Series.where,因此可能會向前填充DataFrame.insert第二個新列中的缺失值,如果 中的兩列中的值相等,則最后洗掉行boolean indexing:
categ_group = ['TRANSPORT', 'ACCOMODATION', 'DELIVERY']
df = df.rename(columns={'Item':'Element'})
df.insert(1, 'Item', df['Element'].where(df['Element'].isin(categ_group)).ffill())
df =df[ df['Element'].ne(df['Item'])]
print (df)
Element Item Category Value
1 Bus TRANSPORT A 2
2 Car TRANSPORT A 3
3 Automobile TRANSPORT A 4
4 Bike TRANSPORT A 5
6 House ACCOMODATION A 7
7 Apartment ACCOMODATION A 8
9 Glovo DELIVERY B 10
10 Emag DELIVERY B 11
11 Transporter DELIVERY B 12
13 Apartment1 ACCOMODATION B 14
14 Apartment2 ACCOMODATION B 15
16 Rental ACCOMODATION C 17
17 Apartment ACCOMODATION C 18
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/319090.html
上一篇:自動比較按列分組的行并突出顯示Excel中的不同單元格
下一篇:檔案格式或檔案擴展名無效
