我是 Keras 的新手,我想將我的火車資料放入 Excel 檔案中。我的資料有 shape(1000, 5, 5),1000 批資料保存在 1000 個電子表格中,每個作業表包含 5 列和行:
| 一種 | 乙 | C | D | 乙 |
|---|---|---|---|---|
| —— | —— | —— | —— | 標簽 |
| —— | —— | —— | —— | 標簽 |
| —— | —— | —— | —— | 標簽 |
| —— | —— | —— | —— | 標簽 |
| —— | —— | —— | —— | 標簽 |
我希望 A、B、C 列作為訓練特征,E 列作為標簽。
import pandas as pd
import tensorflow as tf
import multiprocessing
df = pd.read_excel('File.xlsx', sheet_name=None)
data_list = list(df.values())
def input_parser(x):
Y = x.pop('E')
features = ['A','B','C']
X = x[features]
return X, Y
dataset = tf.data.Dataset.from_tensor_slices(data_list)
dataset = dataset.map(lambda x: tuple(tf.py_function(func=input_parser,
inp=[x],
Tout=[tf.float32,tf.int64])),
num_parallel_calls=multiprocessing.cpu_count())
然后我得到了一個錯誤:
ValueError: Can't convert non-rectangular Python sequence to Tensor.
為什么我會收到這個錯誤?如何將這些資料擬合到我的模型中?
uj5u.com熱心網友回復:
也許嘗試map完全省略您的功能,只需將您的資料直接傳遞給tf.data.Dataset.from_tensor_slices:
import pandas as pd
import tensorflow as tf
import numpy as np
spread_sheet1 = {'A': [1, 2, 1, 2, 9], 'B': [3, 4, 6, 1, 4], 'C': [3, 4, 3, 1, 4], 'D': [1, 2, 6, 1, 4], 'E': [0, 1, 1, 0, 1]}
df1 = pd.DataFrame(data=spread_sheet1)
spread_sheet2 = {'A': [1, 2, 1, 2, 4], 'B': [3, 5, 2, 1, 4], 'C': [9, 4, 1, 1, 4], 'D': [1, 5, 6, 1, 7], 'E': [1, 1, 1, 0, 1]}
df2 = pd.DataFrame(data=spread_sheet2)
features = ['A','B','C']
Y = np.stack([df1['E'].to_numpy(), df2['E'].to_numpy()])
Y = tf.convert_to_tensor(Y, dtype=tf.int32)
X = np.stack([df1[features].to_numpy(), df2[features].to_numpy()])
X = tf.convert_to_tensor(X, dtype=tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
print('Shape of X --> ', X.shape)
for x, y in dataset:
print(x, y)
Shape of X --> (2, 5, 3)
tf.Tensor(
[[1. 3. 3.]
[2. 4. 4.]
[1. 6. 3.]
[2. 1. 1.]
[9. 4. 4.]], shape=(5, 3), dtype=float32) tf.Tensor([0 1 1 0 1], shape=(5,), dtype=int32)
tf.Tensor(
[[1. 3. 9.]
[2. 5. 4.]
[1. 2. 1.]
[2. 1. 1.]
[4. 4. 4.]], shape=(5, 3), dtype=float32) tf.Tensor([1 1 1 0 1], shape=(5,), dtype=int32)
從file.xlsx具有多張作業表的 excel 檔案中讀取可以像這樣完成:
import pandas as pd
import tensorflow as tf
import multiprocessing
df = pd.read_excel('file.xlsx', sheet_name=None)
file_names = list(df.keys())
columns = ['A','B','C']
features = []
labels = []
for n in file_names:
df = pd.read_excel('file.xlsx', sheet_name=n)
features.append(df[columns].to_numpy())
labels.append(df['E'].to_numpy())
Y = tf.convert_to_tensor(np.stack(labels), dtype=tf.int32)
X = tf.convert_to_tensor(np.stack(features), dtype=tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
print('Shape of X --> ', X.shape)
for x, y in dataset:
print(x, y)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/347409.html
