半個月前初識tensorflow,想寫一款自己的語音識別程式,當我使用thchs30訓練完語料庫后,卻找不到如何呼叫自己的錄制的語音,特來求助各位大神。望能解決小弟半個月來的疑問。。。。
以下是測驗訓練模型代碼,但是不知道怎樣輸入自定義音頻進行識別:
import os
import difflib
import tensorflow as tf
import numpy as np
from utils import decode_ctc, GetEditDistance
# 0.準備解碼所需字典,引數需和訓練一致,也可以將字典保存到本地,直接進行讀取
from utils import get_data, data_hparams
data_args = data_hparams()
train_data = get_data(data_args)
# 1.聲學模型-----------------------------------
from model_speech.cnn_ctc import Am, am_hparams
am_args = am_hparams()
am_args.vocab_size = len(train_data.am_vocab)
am = Am(am_args)
print('loading acoustic model...')
am.ctc_model.load_weights('logs_am/model.h5')
# 2.語言模型-------------------------------------------
from model_language.transformer import Lm, lm_hparams
lm_args = lm_hparams()
lm_args.input_vocab_size = len(train_data.pny_vocab)
lm_args.label_vocab_size = len(train_data.han_vocab)
lm_args.dropout_rate = 0.
print('loading language model...')
lm = Lm(lm_args)
sess = tf.Session(graph=lm.graph)
with lm.graph.as_default():
saver =tf.train.Saver()
with sess.as_default():
latest = tf.train.latest_checkpoint('logs_lm')
saver.restore(sess, latest)
# 3. 準備測驗所需資料, 不必和訓練資料一致,通過設定data_args.data_type測驗,
# 此處應設為'test',我用了'train'因為演示模型較小,如果使用'test'看不出效果,
# 且會出現未出現的詞。
data_args.data_type = 'test'
data_args.shuffle = False
data_args.batch_size = 1
test_data = get_data(data_args)
# 4. 進行測驗-------------------------------------------
am_batch = test_data.get_am_batch()
word_num = 0
word_error_num = 0
for i in range(10):
print('\n the ', i, 'th example.')
# 載入訓練好的模型,并進行識別
inputs, _ = next(am_batch)
x = inputs['the_inputs']
y = test_data.pny_lst[i]
result = am.model.predict(x, steps=1)
# 將數字結果轉化為文本結果
_, text = decode_ctc(result, train_data.am_vocab)
text = ' '.join(text)
print('文本結果:', text)
print('原文結果:', ' '.join(y))
with sess.as_default():
text = text.strip('\n').split(' ')
x = np.array([train_data.pny_vocab.index(pny) for pny in text])
x = x.reshape(1, -1)
preds = sess.run(lm.preds, {lm.x: x})
label = test_data.han_lst[i]
got = ''.join(train_data.han_vocab[idx] for idx in preds[0])
print('原文漢字:', label)
print('識別結果:', got)
word_error_num += min(len(label), GetEditDistance(label, got))
word_num += len(label)
print('詞錯誤率:', word_error_num / word_num)
sess.close()
uj5u.com熱心網友回復:
哎呀~怎么沒人呢?uj5u.com熱心網友回復:
頂一個~~~~uj5u.com熱心網友回復:
各位大神~求求幫幫忙咯~~~轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/38415.html
標籤:人工智能技術
上一篇:afxwin.h_和winuser.h C2011_“tagTOUCHINPUT”:“struct”型別重定義_
下一篇:python random模塊
