的恩惠到期在18小時。回答這個問題有資格獲得 50聲望獎勵。 Titanium正在尋找規范的答案。
我已經看到許多與我的問題相關的帖子,但找不到合適的解決方案。 讀取包含多個表格的 Excel 表格,表格的標題為非白色背景單元格顏色
這是該 Excel 資料的鏈接:https : //docs.google.com/spreadsheets/d/1m4v_wbIJCGWBigJx53BRnBpMDHZ9CwKf/edit? usp = sharing & ouid = 107579116880049687042 & rtpof = true & sd =true
到目前為止,我已經嘗試過:
import pandas as pd
df = pd.read_excel("dell.xlsx")
df =df.dropna()
上面的代碼洗掉了想要的資料,因為它有 nan。
df.iloc[1,2:5]=['Description','Qty','Price']
print(df)
nul_rows = list(df[df.isnull().all(axis=1)].index)
list_of_dataframes = []
for i in range(len(nul_rows) - 1):
list_of_dataframes.append(df.iloc[nul_rows[i] 1:nul_rows[i 1],:])
cleaned_tables = []
for _df in list_of_dataframes:
cleaned_tables.append(_df.dropna(axis=1, how='all'))
for p in cleaned_tables:
print(p.dropna())
無法獲得我想要的資料,因為這些資料不是標題格式,而是未命名的。
我想從鏈接中的 excel 中提取這些資料“Sku 描述”“數量”“價格”“總計”。
希望我得到一些回應。
筆記!檔案內容和格式總是不同的,所以基于一個檔案的解決方案不能用于下一個檔案,但標題的名稱不會改變,例如數量、描述、總數。
uj5u.com熱心網友回復:
這是從該 excel 檔案中創建資料框的一種方法。請參閱代碼本身中的注釋。
您可以取消注釋一些 print() 以查看它是如何開發的。
代碼
import pandas as pd
import numpy as np
def get_dataframes(fn):
"""
Returns 3 dataframes from the 3 tables given inside excel file fn.
"""
# (1) Read file
df_start = pd.read_excel(fn)
df = df_start.copy() # preserve df_start, just in case we need it later
# print(df.to_string()) # study layout
# (2) Clean df
df = pd.DataFrame(np.vstack([df.columns, df])) # reset column index, so we can see 0, 1, 2 ...
df = df.drop([0], axis=1) # drop index column 0, it is not needed
df = df.dropna(how='all') # drop rows if all cells are empty
df = df.reset_index(drop=True) # reset row index so we can see the correct row no. 0, 1, ...
df = pd.DataFrame(np.vstack([df.columns, df])) # reset col index so we can see the correct col no. 0, 1, ...
# (3) Tasks
# We have 3 dataframes to create:
# 1) df1 is for the first table the one with "Description".
# 2) df2 is the one with the first "Sku Description:".
# 3) df3 is the one with the second "Sku Description:".
# (4) Get all the indexes (row and col) of "Description" and "Sku Description:" texts,
# we will use it to extract data in order to create dataframes df1, df2 and df3.
# Get the index of "Description" we will get row index and col index from it, by knowing
# the row index, we will be able to extract at correct row index on this particular table.
i_desc = df.where(df=='Description').dropna(how='all').dropna(how='all', axis=1)
# Assume that there is only one "Description" and this is true according to the given sample excel files.
i_desc_row = i_desc.index[0]
# print(f'row index of "Description": {i_desc_row}')
# Get the index of "Sku Description:".
i_sku = df.where(df=='Sku Description:').dropna(how='all').dropna(how='all', axis=1)
# There are 2 "Sku Description:", get the row indexes of each.
i_sku_row = i_sku.index
# print(f'row indexes of "Sku Description:": {i_sku_row}')
i_sku_row_1 = i_sku_row[0]
i_sku_row_2 = i_sku_row[1]
# print(f'i_sku_row_1: {i_sku_row_1}, i_sku_row_2: {i_sku_row_2}')
# There are 2 "Sku Description:", get the col indexes of each.
i_sku_col = i_sku.columns
# print(f'col indexes of "Sku Description:": {i_sku_col}')
if len(i_sku_col) == 2:
i_sku_col_1 = i_sku_col[0]
i_sku_col_2 = i_sku_col[1]
else:
i_sku_col_1 = i_sku_col[0]
i_sku_col_2 = i_sku_col_1
# print(f'i_sku_col_1: {i_sku_col_1}, i_sku_col_2: {i_sku_col_2}')
# (5) Create df1
cols = ['Description', 'Qty', 'Price', 'Total']
df1 = df.iloc[i_desc_row 1 : i_sku_row_1-2, 0:4] # [start_row:end_row, start_col:end_col]
df1.columns = cols
# print(df1)
df1 = df1.reset_index(drop=True) # reset the row index so we can see 0, 1, ...
# print(df1)
# (6) Create df2
cols = ['Sku Description:', 'Qty:']
df2 = df.iloc[i_sku_row_1 1 : i_sku_row_2, i_sku_col_1:i_sku_col_1 2]
df2.columns = cols
df2 = df2.reset_index(drop=True)
# print(df2)
# (7) Create df3
cols = ['Sku Description:', 'Qty:']
df3 = df.iloc[i_sku_row_2 1:, i_sku_col_2:i_sku_col_2 2]
df3.columns = cols
df3 = df3.reset_index(drop=True)
# print(df3)
return df1, df2, df3
def process_file():
# fn = 'F:\\Downloads\\dell.xlsx'
fn = 'F:\\Downloads\\dell2.xlsx'
desc_df, sku1_df, sku2_df = get_dataframes(fn)
print(f'file: {fn}')
print(f'desc_df:\n{desc_df}\n')
print(f'sku1_df:\n{sku1_df}\n')
print(f'sku2_df:\n{sku2_df}\n')
# Start
process_file()
輸出
file: F:\Downloads\dell2.xlsx
desc_df:
Description Qty Price Total
0 PowerEdge R640 Server 2.0 6390.0 12780
1 PowerEdge R640 Server 8.0 4360.0 34880
sku1_df:
Sku Description: Qty:
0 PowerEdge R640 Server 1.0
1 PowerEdge R640 MLK Motherboard 1.0
2 Intel Xeon Silver 4216 2.1G, 16C/32T, 9.6GT/s,... 2.0
3 Intel Xeon Silver 4216 2.1G, 16C/32T, 9.6GT/s,... 2.0
4 iDRAC Group Manager, Enabled 1.0
5 iDRAC,Factory Generated Password 1.0
6 Additional Processor Selected 1.0
7 2.5 Chassis with up to 10 Hard Drives and 3PCI... 1.0
8 Standard Bezel 1.0
9 Riser Config 2, 3x16 LP 1.0
10 PowerEdge R640 Shipping Material for 4 and 10 ... 1.0
11 PowerEdge R640 Shipping(ICC), for 1300W below, V2 1.0
12 No Quick Sync 1.0
13 Dell EMC Luggage Tag for x10 1.0
14 Performance Optimized 1.0
15 3200MT/s RDIMMs 1.0
16 DIMM Blanks for System with 2 Processors 1.0
17 32GB RDIMM, 3200MT/s, Dual Rank 16Gb BASE x8 4.0
18 iDRAC9,Enterprise 1.0
19 2.4TB 10K RPM SAS 12Gbps 512e 2.5in Hot-plug H... 8.0
20 3.84TB SSD SATA Read Intensive 6Gbps 512 2.5in... 2.0
21 BOSS controller card with 2 M.2 Sticks 480GB... 1.0
22 PERC H750 Adapter, Low Profile 1.0
23 8 Standard Fans for R640 1.0
24 Performance BIOS Settings 1.0
25 Standard 1U Heatsink 2.0
26 No Internal Optical Drive 1.0
27 Dual, Hot-plug, Redundant Power Supply (1 1), ... 1.0
28 Jumper Cord - C13/C14, 2M, 250V, 10A 2.0
29 Power Cord - C13, 1.8M, 250V, 10A 2.0
30 Trusted Platform Module 2.0 V3 1.0
31 PowerEdge R640 No CE Marking, ICC, for 1300W P... 1.0
32 Broadcom 5720 Quad Port 1GbE BASE-T, rNDC 1.0
33 No Operating System 1.0
34 No Systems Documentation, No OpenManage DVD Kit 1.0
35 Basic Deployment Dell Server R Series 1U/2U 1.0
36 PowerEdge-SE02 Handling n Insurance Charges(In... 1.0
37 ReadyRails Sliding Rails With Cable Management... 1.0
38 Unconfigured RAID 1.0
39 UEFI BIOS Boot Mode with GPT Partition 1.0
40 OpenManage Enterprise Advanced 1.0
41 Basic Next Business Day 36 Months 1.0
42 ProSupport and Next Business Day Onsite Servic... 1.0
43 ProSupport and Next Business Day Onsite Servic... 1.0
44 INFO: Thank you for choosing Dell 1.0
45 Mod Specs Info 1.0
sku2_df:
Sku Description: Qty:
0 PowerEdge R640 Server 1.0
1 PowerEdge R640 MLK Motherboard 1.0
2 Intel Xeon Silver 4216 2.1G, 16C/32T, 9.6GT/s,... 1.0
3 iDRAC,Factory Generated Password 1.0
4 iDRAC Group Manager, Enabled 1.0
5 2.5 Chassis with up to 10 Hard Drives and 3PCI... 1.0
6 Standard Bezel 1.0
7 Riser Config 4, 2x16 LP 1.0
8 PowerEdge R640 Shipping(ICC), for 1300W below, V2 1.0
9 PowerEdge R640 Shipping Material for 4 and 10 ... 1.0
10 No Quick Sync 1.0
11 Dell EMC Luggage Tag for x10 1.0
12 Performance Optimized 1.0
13 32GB RDIMM, 3200MT/s, Dual Rank 16Gb BASE x8 2.0
14 Blank for 1CPU Configuration 1.0
15 3200MT/s RDIMMs 1.0
16 Blank for 1CPU Configuration 1.0
17 No Additional Processor 1.0
18 iDRAC9,Enterprise 1.0
19 2.4TB 10K RPM SAS 12Gbps 512e 2.5in Hot-plug H... 3.0
20 BOSS controller card with 2 M.2 Sticks 480GB... 1.0
21 PERC H750 Adapter, Low Profile 1.0
22 Performance BIOS Settings 1.0
23 5 Standard Fans for R640 1.0
24 Standard 1U Heatsink 1.0
25 1.92TB SSD vSAS Read Intensive 12Gbps 512e 2.5... 2.0
26 No Internal Optical Drive 1.0
27 Dual, Hot-plug, Redundant Power Supply (1 1), ... 1.0
28 Power Cord - C13, 1.8M, 250V, 10A 2.0
29 Jumper Cord - C13/C14, 2M, 250V, 10A 2.0
30 Trusted Platform Module 2.0 V3 1.0
31 PowerEdge R640 No CE Marking, ICC, for 1300W P... 1.0
32 Broadcom 5720 Quad Port 1GbE BASE-T, rNDC 1.0
33 No Operating System 1.0
34 No Systems Documentation, No OpenManage DVD Kit 1.0
35 Basic Deployment Dell Server R Series 1U/2U 1.0
36 PowerEdge-SE02 Handling n Insurance Charges 1.0
37 ReadyRails Sliding Rails With Cable Management... 1.0
38 Unconfigured RAID 1.0
39 UEFI BIOS Boot Mode with GPT Partition 1.0
40 OpenManage Enterprise Advanced 1.0
41 Basic Next Business Day 36 Months 1.0
42 ProSupport and Next Business Day Onsite Servic... 1.0
43 ProSupport and Next Business Day Onsite Servic... 1.0
44 INFO: Thank you for choosing Dell 1.0
45 Mod Specs Info 1.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/383607.html
上一篇:如何在PEG語法中定義十進制數?
