我有一個遵循以下結構(玩具)的 excel 檔案串列:

data1 = [["","","0:15","during the phase of that time","during","","cat1"],["","","","","the","","cat1"],["","","","","phase","","cat2"],["","","","","of","","cat1"],["","","","","that","","cat1"],["","","","","time","","cat3"]]
data2 = [["1","1","0:19","at what point is enough enough tommy","at","","cat1"],["","","","","what","","cat2"],["","","","","point","","cat2"],["","","","","is","","cat2"],["","","","","enough","","cat1"],["","","","","enough","","cat1"],["","","","","tommy","","cat3"]]
df1 = pd.DataFrame(data1, columns=['Intro', 'Speaker','time','transcript','word','motiv','category'])
df2 = pd.DataFrame(data2, columns=['Intro', 'Speaker','time','transcript','word','motiv','category'])
df_list = [df1,df2]
names = ["file1","file2"]
for i in range(len(df_list)):
filename = 'Filename' names[i] '.xlsx'
writer = ExcelWriter(filename)
print(filename)
df_list[i].to_excel(writer, 'Sheet1', index=False)
writer.save()
我對這個串列(超過 200 個 df)的目標是獲取特定列的摘要資料并將該資訊存盤到另一個聚合的 excel 檔案中。在這個玩具資料集中,摘要示例將包括 cat1、cat2 和 cat3 在“類別”列中出現的次數。
我在名為“results.xlsx”的 excel 檔案中的最終輸出如下所示:

我對 pandas 的了解非常有限,到目前為止,我的代碼一直是我參與過的其他專案的科學怪人,這些專案與資料框串列一起作業。到目前為止,我有:
#Get directory of excel files (using df_list for toy dataset)
os.chdir("path")
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.xlsx"))
#Set aside list
li = []
#Create aggregate dataframe
frame = pd.DataFrame()
for f in csv_files:
# read the csv file
df = pd.read_excel(f).clean_names()
#Set filename variable
frame['filename'] = os.path.basename(f)
#Set category 1 counts variable
frame['category1'] = df.iloc[:,6].value_counts()
#Append aggregated results to aggregate dataframe
li.append(frame)
#Concatenate results
frame2 = pd.concat(li)
但是,當我將其應用于實際資料時,我得到的結果與我所期望的完全不符。具體來說,我得到一個索引,其中包含我正在計數的實際變數(第 1 類),結果中只提供了檔案串列中的最后一個 excel 檔案,并且我的第 1 類變數為空:
這在長度上連續重復我串列中的檔案數。
我不知道我在這里做錯了什么,有什么提示可以引導我朝著正確的方向前進嗎?
uj5u.com熱心網友回復:
這是一種方法:
from pathlib import Path
files = Path('.').glob('*.xlsx')
pd.DataFrame(pd.read_excel(f)['category'].value_counts().rename(f.stem) for f in files)
cat1 cat2 cat3
Filenamefile1 4 1 1
Filenamefile2 3 3 1
...
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/490578.html
