對 Python 和 Pandas 很陌生……但問題是我的最終輸出檔案沒有排除“客戶編號”上的任何重復項。任何關于為什么會發生這種情況的建議將不勝感激!

import pandas as pd
import numpy as np #numpy is the module which can replace errors from huge datasets
from openpyxl import load_workbook
from openpyxl.styles import Font
df_1 = pd.read_excel('PRT Tracings 2020.xlsx', sheet_name='Export') #this is reading the Excel document shifts and looks at sheet
df_2 = pd.read_excel('PRT Tracings 2021.xlsx', sheet_name='Export') #this reads the same Excel document but looks at a different sheet
df_3 = pd.read_excel('PRT Tracings YTD 2022.xlsx', sheet_name='Export') #this reads a different Excel file, and only has one sheet so no need to have it read a sheet
df_all = pd.concat([df_1, df_2, df_3], sort=False) #this combines the sheets from 1,2,3 and the sort function as false so our columns stay in the same order
to_excel = df_all.to_excel('Combined_PRT_Tracings.xlsx', index=None) #this Excel file combines all three sheets into one spreadsheet
df_all = df_all.replace(np.nan, 'N/A', regex=True) #replaces errors with N/A
remove = ['ORDERNUMBER', 'ORDER_TYPE', 'ORDERDATE', 'Major Code Description', 'Product_Number_And_Desc', 'Qty', 'Order_$', 'Order_List_$'] #this will remove all unwanted columns
df_all.drop(columns=remove, inplace=True)
df_all.drop_duplicates(subset=['Customer Number'], keep=False) #this will remove all duplicates from the tracing number syntax with pandas module
to_excel = df_all.to_excel('Combined_PRT_Tracings.xlsx', index=None) #this Excel file combines all three sheets into one spreadsheet
wb = load_workbook('Combined_PRT_Tracings.xlsx') #we are using this to have openpyxl read the data, from the spreadsheet already created
ws = wb.active #this workbook is active
wb.save('Combined_PRT_Tracings.xlsx')
uj5u.com熱心網友回復:
您應該將 的回傳值分配給df_all.drop_duplicates變數或設定inplace=True為覆寫 DataFrame 內容。這是為了防止對原始資料進行意外更改。
嘗試:
df_all = df_all.drop_duplicates(subset='Customer Number', keep=False)
或等效的:
df_all.drop_duplicates(subset='Customer Number', keep=False, inplace=True)
這將從 DataFrame 中洗掉所有重復的行。如果要保留包含重復項的第一行或最后一行,請更改keep為first或last。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/533452.html
標籤:Python熊猫麻木的
