用 kaldi 和 CVTE開源模型 實作語音識別
文章目錄
- 用 kaldi 和 CVTE開源模型 實作語音識別
- 下載模型
- 使用
- 測驗自己的資料集
- 準備檔案
- 0. 音頻檔案
- 1. wav.scp
- wav.scp 格式
- 2. utt2spk
- utt2spk 格式
- 3. spk2utt
- spk2utt 格式
- 測驗:
下載模型
CVTE開源了kaldi的中文模型,
模型下載地址: http://kaldi-asr.org/models/0002_cvte_chain_model.tar.gz
解壓放到kaldi/egs/下
使用
將egs/wsj/s5中的steps和utils拷貝到egs/cvte/s5目錄下:
將egs/hkust/s5/local/score.sh拷貝到egs/cvte/s5/local目錄下:
cp -r egs/wsj/s5/steps egs/cvte/s5/steps
cp -r egs/wsj/s5/utils egs/cvte/s5/utils
cp egs/hkust/s5/local/score.sh egs/cvte/s5/local
注釋掉utils/lang/check_phones_compatible.sh中if陳述句中的exit 1:
36 # check if the files exist or not
37 if [ ! -f $table_first ]; then
38 if [ ! -f $table_second ]; then
39 echo "$0: Error! Both of the two phones-symbol tables are absent."
40 echo "Please check your command"
41 #exit 1;
42 else
43 # The phones-symbol-table1 is absent. The model directory maybe created by old script.
44 # For back compatibility, this script exits silently with status 0.
45 exit 0;
46 fi
然后執行./run.sh就可以了
測驗自己的資料集
準備檔案
0. 音頻檔案
要求是16-bit位深,采樣率16000Hz,單聲道,wav格式的語言檔案
1. wav.scp

wav.scp 格式
音頻id 音頻位置
如下:
AUDIO_20211129_170900_0000 ./audio/2021_11_29_17.09.00_0000.wav
AUDIO_20211129_170901_0000 ./audio/2021_11_29_17.09.01_0000.wav
AUDIO_20211129_170902_0000 ./audio/2021_11_29_17.09.02_0000.wav
AUDIO_20211129_170903_0000 ./audio/2021_11_29_17.09.03_0000.wav
AUDIO_20211129_170904_0000 ./audio/2021_11_29_17.09.04_0000.wav
AUDIO_20211129_170905_0000 ./audio/2021_11_29_17.09.05_0000.wav
AUDIO_20211129_170906_0000 ./audio/2021_11_29_17.09.06_0000.wav
AUDIO_20211129_170907_0000 ./audio/2021_11_29_17.09.07_0000.wav
AUDIO_20211129_170908_0000 ./audio/2021_11_29_17.09.08_0000.wav
AUDIO_20211129_170909_0000 ./audio/2021_11_29_17.09.09_0000.wav
AUDIO_20211129_170910_0000 ./audio/2021_11_29_17.09.10_0000.wav
AUDIO_20211129_170911_0000 ./audio/2021_11_29_17.09.11_0000.wav
AUDIO_20211129_170912_0000 ./audio/2021_11_29_17.09.12_0000.wav
AUDIO_20211129_170913_0000 ./audio/2021_11_29_17.09.13_0000.wav
AUDIO_20211129_170914_0000 ./audio/2021_11_29_17.09.14_0000.wav
2. utt2spk
音頻ID
說話人ID
音頻ID最好含有說話人ID
由于本例沒有說話人,所以用音頻ID代替說話人,即每條音頻都是一個獨立的說話人
utt2spk 格式
音頻ID1 說話人1
音頻ID2 說話人2
如下:
AUDIO_20211129_170900_0000 AUDIO_20211129_170900_0000
AUDIO_20211129_170901_0000 AUDIO_20211129_170901_0000
AUDIO_20211129_170902_0000 AUDIO_20211129_170902_0000
AUDIO_20211129_170903_0000 AUDIO_20211129_170903_0000
AUDIO_20211129_170904_0000 AUDIO_20211129_170904_0000
AUDIO_20211129_170905_0000 AUDIO_20211129_170905_0000
AUDIO_20211129_170906_0000 AUDIO_20211129_170906_0000
AUDIO_20211129_170907_0000 AUDIO_20211129_170907_0000
AUDIO_20211129_170908_0000 AUDIO_20211129_170908_0000
AUDIO_20211129_170909_0000 AUDIO_20211129_170909_0000
AUDIO_20211129_170910_0000 AUDIO_20211129_170910_0000
AUDIO_20211129_170911_0000 AUDIO_20211129_170911_0000
AUDIO_20211129_170912_0000 AUDIO_20211129_170912_0000
AUDIO_20211129_170913_0000 AUDIO_20211129_170913_0000
AUDIO_20211129_170914_0000 AUDIO_20211129_170914_0000
3. spk2utt
spk2utt 格式
說話人1 音頻 音頻 音頻
說話人2 音頻 音頻 音頻
有幾個說話人就是幾行,中間用空格隔開
如下:
AUDIO_20211129_170900_0000 AUDIO_20211129_170900_0000
AUDIO_20211129_170901_0000 AUDIO_20211129_170901_0000
AUDIO_20211129_170902_0000 AUDIO_20211129_170902_0000
AUDIO_20211129_170903_0000 AUDIO_20211129_170903_0000
AUDIO_20211129_170904_0000 AUDIO_20211129_170904_0000
AUDIO_20211129_170905_0000 AUDIO_20211129_170905_0000
AUDIO_20211129_170906_0000 AUDIO_20211129_170906_0000
AUDIO_20211129_170907_0000 AUDIO_20211129_170907_0000
AUDIO_20211129_170908_0000 AUDIO_20211129_170908_0000
AUDIO_20211129_170909_0000 AUDIO_20211129_170909_0000
AUDIO_20211129_170910_0000 AUDIO_20211129_170910_0000
AUDIO_20211129_170911_0000 AUDIO_20211129_170911_0000
AUDIO_20211129_170912_0000 AUDIO_20211129_170912_0000
AUDIO_20211129_170913_0000 AUDIO_20211129_170913_0000
AUDIO_20211129_170914_0000 AUDIO_20211129_170914_0000
測驗:
替換data/fbank/test/ 下同名檔案在 執行./run.sh就可以了

可見 準確率還是比較高的
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/376051.html
標籤:其他
