給定一個包含 3 個預測結果的檔案夾,一個例子如下:
date real pred indicator
0 2021/1/31 1.0 0.586583 train
1 2021/2/28 0.6 0.442644 train
2 2021/3/31 -0.5 -0.011877 train
3 2021/4/30 -0.3 -0.011877 train
4 2021/5/31 -0.2 -0.011877 train
5 2021/6/30 -0.4 -0.011877 train
6 2021/7/31 0.3 0.152951 train
7 2021/8/31 0.1 0.393088 train
8 2021/9/30 0.0 0.163108 train
9 2021/10/31 0.7 0.537028 valid
10 2021/11/30 0.4 0.586583 valid
我希望回圈讀取所有excel檔案,MAE分別對每個預測檔案的子集train和子集進行計算valid,最后將計算出的MAE值和excel檔案名連接起來。
我做過的代碼:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error
import os
from pathlib import Path
import re
import glob
file_paths = glob.glob('./preds/*.xlsx')
dfs = pd.DataFrame()
for file_path in file_paths:
data = pd.read_excel(file_path)
# print(data)
model_name = Path(file_path).stem
print(model_name)
train = data.loc[data['indicator'] == 'train']
valid = data.loc[data['indicator'] == 'valid']
print(mean_absolute_error(train['real'], train['pred']))
print(mean_absolute_error(valid['real'], valid['pred']))
出去:
pred3
0.06462038177996876
0.14329840342203767
pred2
0.29968300930091313
0.20961438417434672
pred1
0.2807230830192565
0.1747779786586765
要為保存輸出創建資料框:
col_names = ['model', 'train_mae', 'valid_mae']
dfs = pd.DataFrame(columns=col_names)
我的問題是如何將迭代計算的MAE值附加到最終結果中?謝謝。
預期結果:
model mae_train mae_valid
0 pred1 0.280723 0.174778
1 pred2 0.299683 0.209614
2 pred3 0.064620 0.143298
編輯:
model_name = []
train_mae = []
test_mae = []
for file_path in file_paths:
data = pd.read_excel(file_path)
# print(data)
model_name = Path(file_path).stem
print(model_name)
train = data.loc[data['indicator'] == 'train']
valid = data.loc[data['indicator'] == 'valid']
print(mean_absolute_error(train['real'], train['pred']))
print(mean_absolute_error(valid['real'], valid['pred']))
train_mae = mean_absolute_error(train['real'], train['pred'])
valid_mae = mean_absolute_error(valid['real'], valid['pred'])
model_name.append(model_name)
train_mae.append(train_mae)
valid_mae.append(valid_mae)
df = pd.DataFrame(list(zip(model_name , train_mae, valid_mae)),
columns =['model', 'train_mae', 'valid_mae'])
錯誤:
Traceback (most recent call last):
File "<input>", line 23, in <module>
AttributeError: 'str' object has no attribute 'append'
uj5u.com熱心網友回復:
使用串列串列來存盤您的資訊,然后將其傳遞給pd.DataFrame:
mae_data = []
for file_path in file_paths:
data = pd.read_excel(file_path)
# print(data)
model_name = Path(file_path).stem
print(model_name)
train = data.loc[data['indicator'] == 'train']
valid = data.loc[data['indicator'] == 'valid']
print(mean_absolute_error(train['real'], train['pred']))
print(mean_absolute_error(valid['real'], valid['pred']))
## append [model_name, mae_train, mae_valid] to mae_list
mae_data.append([model_name, mean_absolute_error(train['real'], train['pred']),mean_absolute_error(valid['real'], valid['pred'])])
col_names = ['model', 'train_mae', 'valid_mae']
dfs = pd.DataFrame(data=mae_data, columns=col_names)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/393326.html
上一篇:Int64Index物件沒有屬性get_values
下一篇:如何簡化資料框的for回圈?
