主頁 >  其他 > 民謠女神唱流行,基于AI人工智能so-vits庫訓練自己的音色模型(葉蓓/Python3.10)

民謠女神唱流行,基于AI人工智能so-vits庫訓練自己的音色模型(葉蓓/Python3.10)

2023-05-13 07:50:13 其他

流行天后孫燕姿的音色固然是極好的,但是目前全網都是她的聲音復刻,聽多了難免會有些審美疲勞,在網路上檢索了一圈,還沒有發現民謠歌手的音色模型,人就是這樣,得不到的永遠在騷動,本次我們自己構建訓練集,來打造自己的音色模型,讓民謠女神來唱流行歌曲,要多帶勁就有多帶勁,

構建訓練集

訓練集是指用于訓練神經網路模型的資料集合,這個資料集通常由大量的輸入和對應的輸出組成,神經網路模型通過學習輸入和輸出之間的關系來進行訓練,并且在訓練程序中調整模型的引數以最小化誤差,

通俗地講,如果我們想要訓練民謠歌手葉蓓的音色模型,就需要將她的歌曲作為輸入引數,也就是訓練集,訓練集的作用是為模型提供學習的材料,使其能夠從輸入資料中學習到正確的輸出,通過反復迭代訓練集,神經網路模型可以不斷地優化自身,提高其對輸入資料的預測能力,

沒錯,so-vits庫底層就是神經網路架構,而訓練音色模型庫,本質上解決的是預測問題,關于神經網路架構,請移步:人工智能機器學習底層原理剖析,人造神經元,您一定能看懂,通俗解釋把AI“黑話”轉化為“白話文”,這里不再贅述,

選擇訓練集樣本時,最好選擇具有歌手音色“特質”的歌曲,為什么全網都是孫燕姿?只是因為她的音色辨識度太高,模型可以從輸入資料中更容易地學習到正確的輸出,

此外,訓練集資料貴精不貴多,特征權重比較高的清晰樣本,在訓練效果要比低質量樣本要好,比如歌手“翻唱”的一些歌曲,或者使用非常規唱法的歌曲,這類樣本雖然也具備一些歌手的音色特征,但對于模型訓練來說,實際上起到是反作用,這是需要注意的事情,

這里選擇葉蓓早期專輯《幸福深處》中的六首歌:

通常來說,訓練集的數量越多,模型的性能就越好,但是在實踐中,需要根據實際情況進行權衡和選擇,

在深度學習中,通常需要大量的資料才能訓練出高性能的模型,例如,在計算機視覺任務中,需要大量的影像資料來訓練卷積神經網路模型,但是,在其他一些任務中,如語音識別和自然語言處理,相對較少的資料量也可以訓練出高性能的模型,

通常,需要確保訓練集中包含充足、多樣的樣本,以覆寫所有可能的輸入情況,此外,訓練集中需要包含足夠的正樣本和負樣本,以保證模型的分類性能,

除了數量之外,訓練集的質量也非常重要,需要確保訓練集中不存在偏差和噪聲,同時需要進行資料清洗和資料增強等預處理操作,以提高訓練集的質量和多樣性,

總的來說,訓練集的數量要求需要根據具體問題進行調整,需要考慮問題的復雜性、資料的多樣性、模型的復雜度和訓練演算法的效率等因素,在實踐中,需要進行實驗和驗證,找到最適合問題的訓練集規模,

綜上,考慮到筆者的電腦配置以及訓練時間成本,訓練集相對較小,其他朋友可以根據自己的情況豐儉由己地進行調整,

訓練集資料清洗

準備好訓練集之后,我們需要對資料進行“清洗”,也就是去掉歌曲中的伴奏、停頓以及混音部分,只留下“清唱”的版本,

伴奏和人聲分離推薦使用spleeter庫:

pip3 install spleeter --user

接著運行命令,對訓練集歌曲進行分離操作:

spleeter separate -o d:/output/ -p spleeter:2stems d:/資料.mp3

這里-o代表輸出目錄,-p代表選擇的分離模型,最后是要分離的素材,

首次運行會比較慢,因為spleeter會下載預訓練模型,體積在1.73g左右,運行完畢后,會在輸出目錄生成分離后的音軌檔案:

D:\歌曲制作\清唱 的目錄  
  
2023/05/11  15:38    <DIR>          .  
2023/05/11  13:45    <DIR>          ..  
2023/05/11  13:40        39,651,884 1_1_01. wxs.wav  
2023/05/11  15:34        46,103,084 1_1_02. qad_(Vocals)_(Vocals).wav  
2023/05/11  15:35        43,802,924 1_1_03. hs_(Vocals)_(Vocals).wav  
2023/05/11  15:36        39,054,764 1_1_04. hope_(Vocals)_(Vocals).wav  
2023/05/11  15:36        32,849,324 1_1_05. kamen_(Vocals)_(Vocals).wav  
2023/05/11  15:37        50,741,804 1_1_06. ctrl_(Vocals)_(Vocals).wav  
               6 個檔案    252,203,784 位元組  
               2 個目錄 449,446,780,928 可用位元組

關于spleeter更多的操作,請移步至:人工智能AI庫Spleeter免費人聲和背景音樂分離實踐(Python3.10), 這里不再贅述,

分離后的資料樣本還需要二次處理,因為分離后的音頻本身還會帶有一些輕微的背景音和混音,這里推薦使用noisereduce庫:

pip3 install noisereduce,soundfile

隨后進行降噪處理:

import noisereduce as nr  
import soundfile as sf  
  
# 讀入音頻檔案  
data, rate = sf.read("audio_file.wav")  
  
# 獲取噪聲樣本  
noisy_part = data[10000:15000]  
  
# 估算噪聲  
noise = nr.estimate_noise(noisy_part, rate)  
  
# 應用降噪演算法  
reduced_noise = nr.reduce_noise(audio_clip=data, noise_clip=noise, verbose=False)  
  
# 將結果寫入檔案  
sf.write("audio_file_denoised.wav", reduced_noise, rate)

先通過soundfile庫將歌曲檔案讀出來,然后獲取噪聲樣本并對其使用降噪演算法,最后寫入新檔案,

至此,資料清洗作業基本完成,

訓練集資料切分

深度學習程序中,計算機會把訓練資料讀入顯卡的快取中,但如果訓練集資料過大,會導致記憶體溢位問題,也就是常說的“爆顯存”現象,

將資料集分成多個部分,每次只載入一個部分的資料進行訓練,這種方法可以減少記憶體使用,同時也可以實作并行處理,提高訓練效率,

這里可以使用github.com/openvpi/audio-slicer庫:

git clone https://github.com/openvpi/audio-slicer.git

隨后撰寫代碼:

import librosa  # Optional. Use any library you like to read audio files.  
import soundfile  # Optional. Use any library you like to write audio files.  
  
from slicer2 import Slicer  
  
audio, sr = librosa.load('example.wav', sr=None, mono=False)  # Load an audio file with librosa.  
slicer = Slicer(  
    sr=sr,  
    threshold=-40,  
    min_length=5000,  
    min_interval=300,  
    hop_size=10,  
    max_sil_kept=500  
)  
chunks = slicer.slice(audio)  
for i, chunk in enumerate(chunks):  
    if len(chunk.shape) > 1:  
        chunk = chunk.T  # Swap axes if the audio is stereo.  
    soundfile.write(f'clips/example_{i}.wav', chunk, sr)  # Save sliced audio files with soundfile.

該腳本可以將所有降噪后的清唱樣本切成小樣本,方便訓練,電腦配置比較低的朋友,可以考慮將min_interval和max_sil_kept調的更高一些,這些會切的更碎,所謂“細細切做臊子”,

最后,六首歌被切成了140個小樣本:

D:\歌曲制作\slicer 的目錄  
  
2023/05/11  15:45    <DIR>          .  
2023/05/11  13:45    <DIR>          ..  
2023/05/11  15:45           873,224 1_1_01. wxs_0.wav  
2023/05/11  15:45           934,964 1_1_01. wxs_1.wav  
2023/05/11  15:45         1,039,040 1_1_01. wxs_10.wav  
2023/05/11  15:45         1,391,840 1_1_01. wxs_11.wav  
2023/05/11  15:45         2,272,076 1_1_01. wxs_12.wav  
2023/05/11  15:45         2,637,224 1_1_01. wxs_13.wav  
2023/05/11  15:45         1,476,512 1_1_01. wxs_14.wav  
2023/05/11  15:45         1,044,332 1_1_01. wxs_15.wav  
2023/05/11  15:45         1,809,908 1_1_01. wxs_16.wav  
2023/05/11  15:45           887,336 1_1_01. wxs_17.wav  
2023/05/11  15:45           952,604 1_1_01. wxs_18.wav  
2023/05/11  15:45           989,648 1_1_01. wxs_19.wav  
2023/05/11  15:45           957,896 1_1_01. wxs_2.wav  
2023/05/11  15:45           231,128 1_1_01. wxs_20.wav  
2023/05/11  15:45         1,337,156 1_1_01. wxs_3.wav  
2023/05/11  15:45         1,308,932 1_1_01. wxs_4.wav  
2023/05/11  15:45         1,035,512 1_1_01. wxs_5.wav  
2023/05/11  15:45         2,388,500 1_1_01. wxs_6.wav  
2023/05/11  15:45         2,952,980 1_1_01. wxs_7.wav  
2023/05/11  15:45           929,672 1_1_01. wxs_8.wav  
2023/05/11  15:45           878,516 1_1_01. wxs_9.wav  
2023/05/11  15:45           963,188 1_1_02. qad_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45           901,448 1_1_02. qad_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45         1,411,244 1_1_02. qad_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45         2,070,980 1_1_02. qad_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         2,898,296 1_1_02. qad_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45           885,572 1_1_02. qad_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45           841,472 1_1_02. qad_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45           876,752 1_1_02. qad_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,091,960 1_1_02. qad_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,188,980 1_1_02. qad_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,446,524 1_1_02. qad_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45           924,380 1_1_02. qad_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45           255,824 1_1_02. qad_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,718,180 1_1_02. qad_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45         2,070,980 1_1_02. qad_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45         2,827,736 1_1_02. qad_(Vocals)_(Vocals)_22.wav  
2023/05/11  15:45           862,640 1_1_02. qad_(Vocals)_(Vocals)_23.wav  
2023/05/11  15:45         1,628,216 1_1_02. qad_(Vocals)_(Vocals)_24.wav  
2023/05/11  15:45         1,626,452 1_1_02. qad_(Vocals)_(Vocals)_25.wav  
2023/05/11  15:45         1,499,444 1_1_02. qad_(Vocals)_(Vocals)_26.wav  
2023/05/11  15:45         1,303,640 1_1_02. qad_(Vocals)_(Vocals)_27.wav  
2023/05/11  15:45           998,468 1_1_02. qad_(Vocals)_(Vocals)_28.wav  
2023/05/11  15:45           781,496 1_1_02. qad_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45         1,368,908 1_1_02. qad_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45           892,628 1_1_02. qad_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,386,548 1_1_02. qad_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45           883,808 1_1_02. qad_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45           952,604 1_1_02. qad_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         1,303,640 1_1_02. qad_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45         1,354,796 1_1_03. hs_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45         1,344,212 1_1_03. hs_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45         1,305,404 1_1_03. hs_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45         1,291,292 1_1_03. hs_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         1,338,920 1_1_03. hs_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,093,724 1_1_03. hs_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45         1,375,964 1_1_03. hs_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45         1,409,480 1_1_03. hs_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,481,804 1_1_03. hs_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         2,247,380 1_1_03. hs_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,312,460 1_1_03. hs_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45         1,428,884 1_1_03. hs_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45         1,051,388 1_1_03. hs_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,377,728 1_1_03. hs_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45         1,485,332 1_1_03. hs_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45           897,920 1_1_03. hs_(Vocals)_(Vocals)_22.wav  
2023/05/11  15:45         1,591,172 1_1_03. hs_(Vocals)_(Vocals)_23.wav  
2023/05/11  15:45           920,852 1_1_03. hs_(Vocals)_(Vocals)_24.wav  
2023/05/11  15:45         1,046,096 1_1_03. hs_(Vocals)_(Vocals)_25.wav  
2023/05/11  15:45           730,340 1_1_03. hs_(Vocals)_(Vocals)_26.wav  
2023/05/11  15:45         1,383,020 1_1_03. hs_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45         1,188,980 1_1_03. hs_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45         1,003,760 1_1_03. hs_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,243,664 1_1_03. hs_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45           845,000 1_1_03. hs_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45           892,628 1_1_03. hs_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45           539,828 1_1_03. hs_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45           725,048 1_1_04. hope_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45         1,023,164 1_1_04. hope_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45           202,904 1_1_04. hope_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45           659,780 1_1_04. hope_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         1,017,872 1_1_04. hope_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,495,916 1_1_04. hope_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45         1,665,260 1_1_04. hope_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45           675,656 1_1_04. hope_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,187,216 1_1_04. hope_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,201,328 1_1_04. hope_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,368,908 1_1_04. hope_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45         1,462,400 1_1_04. hope_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45           963,188 1_1_04. hope_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,121,948 1_1_04. hope_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45           165,860 1_1_04. hope_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45         1,116,656 1_1_04. hope_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45           622,736 1_1_04. hope_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45         1,349,504 1_1_04. hope_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45           984,356 1_1_04. hope_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45         2,104,496 1_1_04. hope_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45         1,762,280 1_1_04. hope_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         1,116,656 1_1_04. hope_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45         1,114,892 1_1_05. kamen_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45           874,988 1_1_05. kamen_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45         1,400,660 1_1_05. kamen_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45           943,784 1_1_05. kamen_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45         1,351,268 1_1_05. kamen_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,476,512 1_1_05. kamen_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45           933,200 1_1_05. kamen_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45         1,388,312 1_1_05. kamen_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         1,012,580 1_1_05. kamen_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,365,380 1_1_05. kamen_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,614,104 1_1_05. kamen_(Vocals)_(Vocals)_18.wav  
2023/05/11  15:45         1,582,352 1_1_05. kamen_(Vocals)_(Vocals)_19.wav  
2023/05/11  15:45           949,076 1_1_05. kamen_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45         1,402,424 1_1_05. kamen_(Vocals)_(Vocals)_20.wav  
2023/05/11  15:45         1,268,360 1_1_05. kamen_(Vocals)_(Vocals)_21.wav  
2023/05/11  15:45         1,016,108 1_1_05. kamen_(Vocals)_(Vocals)_22.wav  
2023/05/11  15:45         1,065,500 1_1_05. kamen_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45           874,988 1_1_05. kamen_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45           954,368 1_1_05. kamen_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,049,624 1_1_05. kamen_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45           878,516 1_1_05. kamen_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45         1,019,636 1_1_05. kamen_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         1,383,020 1_1_05. kamen_(Vocals)_(Vocals)_9.wav  
2023/05/11  15:45         1,005,524 1_1_06. ctrl_(Vocals)_(Vocals)_0.wav  
2023/05/11  15:45         1,090,196 1_1_06. ctrl_(Vocals)_(Vocals)_1.wav  
2023/05/11  15:45            84,716 1_1_06. ctrl_(Vocals)_(Vocals)_10.wav  
2023/05/11  15:45           857,348 1_1_06. ctrl_(Vocals)_(Vocals)_11.wav  
2023/05/11  15:45           991,412 1_1_06. ctrl_(Vocals)_(Vocals)_12.wav  
2023/05/11  15:45         1,121,948 1_1_06. ctrl_(Vocals)_(Vocals)_13.wav  
2023/05/11  15:45           931,436 1_1_06. ctrl_(Vocals)_(Vocals)_14.wav  
2023/05/11  15:45         3,129,380 1_1_06. ctrl_(Vocals)_(Vocals)_15.wav  
2023/05/11  15:45         6,202,268 1_1_06. ctrl_(Vocals)_(Vocals)_16.wav  
2023/05/11  15:45         1,457,108 1_1_06. ctrl_(Vocals)_(Vocals)_17.wav  
2023/05/11  15:45         1,046,096 1_1_06. ctrl_(Vocals)_(Vocals)_2.wav  
2023/05/11  15:45           956,132 1_1_06. ctrl_(Vocals)_(Vocals)_3.wav  
2023/05/11  15:45         1,286,000 1_1_06. ctrl_(Vocals)_(Vocals)_4.wav  
2023/05/11  15:45           804,428 1_1_06. ctrl_(Vocals)_(Vocals)_5.wav  
2023/05/11  15:45         1,337,156 1_1_06. ctrl_(Vocals)_(Vocals)_6.wav  
2023/05/11  15:45         1,372,436 1_1_06. ctrl_(Vocals)_(Vocals)_7.wav  
2023/05/11  15:45         2,954,744 1_1_06. ctrl_(Vocals)_(Vocals)_8.wav  
2023/05/11  15:45         6,112,304 1_1_06. ctrl_(Vocals)_(Vocals)_9.wav  
             140 個檔案    183,026,452 位元組

至此,資料切分順利完成,

開始訓練

萬事俱備,只差訓練,首先配置so-vits-svc環境,請移步:AI天后,在線飆歌,人工智能AI孫燕姿模型應用實踐,復刻《遙遠的歌》,原唱晴子(Python3.10),囿于篇幅,這里不再贅述,

隨后將切分后的資料集放在專案根目錄的dataset_raw/yebei檔案夾,如果沒有yebei檔案夾,請進行創建,

隨后構建訓練組態檔:

{  
    "train": {  
        "log_interval": 200,  
        "eval_interval": 800,  
        "seed": 1234,  
        "epochs": 10000,  
        "learning_rate": 0.0001,  
        "betas": [  
            0.8,  
            0.99  
        ],  
        "eps": 1e-09,  
        "batch_size": 6,  
        "fp16_run": false,  
        "lr_decay": 0.999875,  
        "segment_size": 10240,  
        "init_lr_ratio": 1,  
        "warmup_epochs": 0,  
        "c_mel": 45,  
        "c_kl": 1.0,  
        "use_sr": true,  
        "max_speclen": 512,  
        "port": "8001",  
        "keep_ckpts": 10,  
        "all_in_mem": false  
    },  
    "data": {  
        "training_files": "filelists/train.txt",  
        "validation_files": "filelists/val.txt",  
        "max_wav_value": 32768.0,  
        "sampling_rate": 44100,  
        "filter_length": 2048,  
        "hop_length": 512,  
        "win_length": 2048,  
        "n_mel_channels": 80,  
        "mel_fmin": 0.0,  
        "mel_fmax": 22050  
    },  
    "model": {  
        "inter_channels": 192,  
        "hidden_channels": 192,  
        "filter_channels": 768,  
        "n_heads": 2,  
        "n_layers": 6,  
        "kernel_size": 3,  
        "p_dropout": 0.1,  
        "resblock": "1",  
        "resblock_kernel_sizes": [  
            3,  
            7,  
            11  
        ],  
        "resblock_dilation_sizes": [  
            [  
                1,  
                3,  
                5  
            ],  
            [  
                1,  
                3,  
                5  
            ],  
            [  
                1,  
                3,  
                5  
            ]  
        ],  
        "upsample_rates": [  
            8,  
            8,  
            2,  
            2,  
            2  
        ],  
        "upsample_initial_channel": 512,  
        "upsample_kernel_sizes": [  
            16,  
            16,  
            4,  
            4,  
            4  
        ],  
        "n_layers_q": 3,  
        "use_spectral_norm": false,  
        "gin_channels": 768,  
        "ssl_dim": 768,  
        "n_speakers": 1  
    },  
    "spk": {  
        "yebei": 0  
    }  
}

這里epochs是指對整個訓練集進行一次完整的訓練,具體來說,每個epoch包含多個訓練步驟,每個訓練步驟會從訓練集中抽取一個小批量的資料進行訓練,并更新模型的引數,

需要調整的引數是batch_size,如果顯存不夠,需要往下調整,否則也會“爆顯存”,如果訓練程序中出現了下面這個錯誤:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 6.86 GiB already allocated; 0 bytes free; 7.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

那么就說明顯存已經不夠用了,

最后,運行命令開始訓練:

python3 train.py -c configs/config.json -m 44k

終端會回傳訓練程序:

D:\work\so-vits-svc\workenv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate  
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "  
D:\work\so-vits-svc\workenv\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.  
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)  
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]  
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.  
D:\work\so-vits-svc\workenv\lib\site-packages\torch\autograd\__init__.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.  
grad.sizes() = [32, 1, 4], strides() = [4, 1, 1]  
bucket_view.sizes() = [32, 1, 4], strides() = [4, 4, 1] (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\reducer.cpp:337.)  
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass  
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.  
INFO:44k:====> Epoch: 274, cost 39.02 s  
INFO:44k:====> Epoch: 275, cost 17.47 s  
INFO:44k:====> Epoch: 276, cost 17.74 s  
INFO:44k:====> Epoch: 277, cost 17.43 s  
INFO:44k:====> Epoch: 278, cost 17.59 s  
INFO:44k:====> Epoch: 279, cost 17.82 s  
INFO:44k:====> Epoch: 280, cost 17.64 s  
INFO:44k:====> Epoch: 281, cost 17.63 s  
INFO:44k:Train Epoch: 282 [65%]  
INFO:44k:Losses: [1.8697402477264404, 3.029414415359497, 11.415563583374023, 23.37869644165039, 0.2702481746673584], step: 6600, lr: 9.637943809624507e-05, reference_loss: 39.963661193847656

這里每一次Epoch系統都會回傳損失函式等相關資訊,訓練好的模型存放在專案的logs/44k目錄下,模型的后綴名是.pth,

結語

一般情況下,訓練損失率低于50%,并且損失函式在訓練集和驗證集上都趨于穩定,則可以認為模型已經收斂,收斂的模型就可以為我們所用了,如何使用訓練好的模型,請移步:AI天后,在線飆歌,人工智能AI孫燕姿模型應用實踐,復刻《遙遠的歌》,原唱晴子(Python3.10),

最后,奉上民謠女神葉蓓的總訓練6400次的音色模型,與眾鄉親同饗:

pan.baidu.com/s/1m3VGc7RktaO5snHw6RPLjQ?pwd=pqkb   
提取碼:pqkb

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/552332.html

標籤:其他

上一篇:智能化生產應用搭建的實戰案例

下一篇:返回列表

標籤雲
其他(158945) Python(38129) JavaScript(25420) Java(18034) C(15226) 區塊鏈(8265) C#(7972) AI(7469) 爪哇(7425) MySQL(7179) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5871) 数组(5741) R(5409) Linux(5339) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4572) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2433) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) .NET技术(1972) 功能(1967) Web開發(1951) HtmlCss(1936) python-3.x(1918) C++(1915) 弹簧靴(1913) xml(1889) PostgreSQL(1875) .NETCore(1860) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽
  • 網閘典型架構簡述

    網閘架構一般分為兩種:三主機的三系統架構網閘和雙主機的2+1架構網閘。 三主機架構分別為內端機、外端機和仲裁機。三機無論從軟體和硬體上均各自獨立。首先從硬體上來看,三機都用各自獨立的主板、記憶體及存盤設備。從軟體上來看,三機有各自獨立的作業系統。這樣能達到完全的三機獨立。對于“2+1”系統,“2”分為 ......

    uj5u.com 2020-09-10 02:00:44 more
  • 如何從xshell上傳檔案到centos linux虛擬機里

    如何從xshell上傳檔案到centos linux虛擬機里及:虛擬機CentOs下執行 yum -y install lrzsz命令,出現錯誤:鏡像無法找到軟體包 前言 一、安裝lrzsz步驟 二、上傳檔案 三、遇到的問題及解決方案 總結 前言 提示:其實很簡單,往虛擬機上安裝一個上傳檔案的工具 ......

    uj5u.com 2020-09-10 02:00:47 more
  • 一、SQLMAP入門

    一、SQLMAP入門 1、判斷是否存在注入 sqlmap.py -u 網址/id=1 id=1不可缺少。當注入點后面的引數大于兩個時。需要加雙引號, sqlmap.py -u "網址/id=1&uid=1" 2、判斷文本中的請求是否存在注入 從文本中加載http請求,SQLMAP可以從一個文本檔案中 ......

    uj5u.com 2020-09-10 02:00:50 more
  • Metasploit 簡單使用教程

    metasploit 簡單使用教程 浩先生, 2020-08-28 16:18:25 分類專欄: kail 網路安全 linux 文章標簽: linux資訊安全 編輯 著作權 metasploit 使用教程 前言 一、Metasploit是什么? 二、準備作業 三、具體步驟 前言 Msfconsole ......

    uj5u.com 2020-09-10 02:00:53 more
  • 游戲逆向之驅動層與用戶層通訊

    驅動層代碼: #pragma once #include <ntifs.h> #define add_code CTL_CODE(FILE_DEVICE_UNKNOWN,0x800,METHOD_BUFFERED,FILE_ANY_ACCESS) /* 更多游戲逆向視頻www.yxfzedu.com ......

    uj5u.com 2020-09-10 02:00:56 more
  • 北斗電力時鐘(北斗授時服務器)讓網路資料更精準

    北斗電力時鐘(北斗授時服務器)讓網路資料更精準 北斗電力時鐘(北斗授時服務器)讓網路資料更精準 京準電子科技官微——ahjzsz 近幾年,資訊技術的得了快速發展,互聯網在逐漸普及,其在人們生活和生產中都得到了廣泛應用,并且取得了不錯的應用效果。計算機網路資訊在電力系統中的應用,一方面使電力系統的運行 ......

    uj5u.com 2020-09-10 02:01:03 more
  • 【CTF】CTFHub 技能樹 彩蛋 writeup

    ?碎碎念 CTFHub:https://www.ctfhub.com/ 筆者入門CTF時時剛開始刷的是bugku的舊平臺,后來才有了CTFHub。 感覺不論是網頁UI設計,還是題目質量,賽事跟蹤,工具軟體都做得很不錯。 而且因為獨到的金幣制度的確讓人有一種想去刷題賺金幣的感覺。 個人還是非常喜歡這個 ......

    uj5u.com 2020-09-10 02:04:05 more
  • 02windows基礎操作

    我學到了一下幾點 Windows系統目錄結構與滲透的作用 常見Windows的服務詳解 Windows埠詳解 常用的Windows注冊表詳解 hacker DOS命令詳解(net user / type /md /rd/ dir /cd /net use copy、批處理 等) 利用dos命令制作 ......

    uj5u.com 2020-09-10 02:04:18 more
  • 03.Linux基礎操作

    我學到了以下幾點 01Linux系統介紹02系統安裝,密碼啊破解03Linux常用命令04LAMP 01LINUX windows: win03 8 12 16 19 配置不繁瑣 Linux:redhat,centos(紅帽社區版),Ubuntu server,suse unix:金融機構,證券,銀 ......

    uj5u.com 2020-09-10 02:04:30 more
  • 05HTML

    01HTML介紹 02頭部標簽講解03基礎標簽講解04表單標簽講解 HTML前段語言 js1.了解代碼2.根據代碼 懂得挖掘漏洞 (POST注入/XSS漏洞上傳)3.黑帽seo 白帽seo 客戶網站被黑帽植入劫持代碼如何處理4.熟悉html表單 <html><head><title>TDK標題,描述 ......

    uj5u.com 2020-09-10 02:04:36 more
最新发布
  • 民謠女神唱流行,基于AI人工智能so-vits庫訓練自己的音色模型(葉蓓

    流行天后孫燕姿的音色固然是極好的,但是目前全網都是她的聲音復刻,聽多了難免會有些審美疲勞,在網路上檢索了一圈,還沒有發現民謠歌手的音色模型,人就是這樣,得不到的永遠在騷動,本次我們自己構建訓練集,來打造自己的音色模型,讓民謠女神來唱流行歌曲,要多帶勁就有多帶勁。 構建訓練集 訓練集是指用于訓練神經網 ......

    uj5u.com 2023-05-13 07:50:13 more
  • 智能化生產應用搭建的實戰案例

    摘要:本文主要為大家介紹使用華為云數字工廠平臺,快速搭建一個智能化生產管理應用的實戰案例。 本文分享自華為云社區《數字工廠深入淺出系列(一):智能化生產應用搭建的實戰案例》,作者: 云起MAE。 華為云數字工廠平臺,專門面向中小型制造企業的生產制造數字化場景設計,端到端整合了構建生產制造數字化應用所 ......

    uj5u.com 2023-05-13 07:49:43 more
  • PTP主時鐘(時間同步裝置)是怎樣實作時鐘同步的?

    PTP主時鐘(時間同步裝置)是怎樣實作時鐘同步的? PTP主時鐘(時間同步裝置)是怎樣實作時鐘同步的? 京準電子科技官微——ahjzsz 1、什么是PTP1588v2? 對于無線通信來說,時鐘同步至關重要,是基站正常作業的必要條件。如果同步有問題,輕則切換成功率降低,重則系統無法運行。 從3G/4G ......

    uj5u.com 2023-05-13 07:48:45 more
  • 使用Pandoc構建Acm模板

    使用Pandoc構建Acm模板 下周日打完河南ICPC省賽就要退役了,以后一場比賽前想要整理一下板子,想要一個擁有目錄,頁眉。頁腳的Acm模板,這樣就可以在比賽的時候快速翻閱,而且要更加好看 但是存在的問題是:很多構建 Acm模板的時候會使用Latex進行構建,但是我使用了很多,要么是些許麻煩,也許 ......

    uj5u.com 2023-05-13 07:48:20 more
  • 一致性哈希(哈希環)解決資料分布問題

    哈希演算法是程式開發程序中最廣泛接觸到的的演算法之一,典型的應用有安全加密、資料校驗、唯一標識、散列函式、負載均衡、資料分片、分布式存盤。前些天遇到用一致性哈希(哈希環)的場景,不過我細想一下,對這個知識點好像了解過,但是又沒太深印象,說不出具體是什么原理,怎么用,有哪些注意的地方。本文簡單記錄,希望也 ......

    uj5u.com 2023-05-13 07:48:16 more
  • NES 系統架構

    本文以圖文相結合的方式介紹了 NES(FC、紅白機、小霸王)的系統架構,可以讓讀者對 NES 的作業原理有高層次的認知,如果想要開發 NES 模擬器,這也是很好的入門資料。 ......

    uj5u.com 2023-05-13 07:47:52 more
  • 程式員IT行業,外行眼里高收入人群,內行人里的卷王

    程式員 一詞,在我眼里其實是貶義詞。因為我的其他不是這行的親朋友好友,你和他們說,你是一名程式員· 他們 第一刻板影響就是,禿頭,肥胖,宅男,油膩,不修邊幅 反正給人一種不干凈,不好形象,,,,不知道什么時候開始網路上也去渲染這些,把程式員和這些聯想在一起了。 回到正題,我們來聊聊,我們光鮮靚麗背后 ......

    uj5u.com 2023-05-13 07:47:45 more
  • 記一次C++后臺開發面試拷打程序

    開頭簡單的自我介紹,面試官和我聊了聊天緩解個人緊張狀況,然后就讓開螢屏共享開視頻做題目,做完以后,問了一些問題,就讓等通知了,估計是涼了,不過這里且把當時做的筆試題目復盤一下吧!題目是ai做的題解,唉,AI都比我強,比我面試的時候解釋的強多了,未來該何去何從啊... 微*團隊c++筆試題 45分鐘 ......

    uj5u.com 2023-05-13 07:47:32 more
  • 如何在虛擬機Linux系統下(Ubuntu)安裝apache2?如何更改Apache2的

    一、apache2的安裝: 1、在圖形界面下,先ctrl+alt+T打開終端 2、登陸root賬號: 輸入su 輸入之前設定的密碼 待$符號變成#時候,即獲得root權限 3、安裝apache2: 輸入apt-get install apache2 4、啟動apache2: 輸入service ap ......

    uj5u.com 2023-05-13 07:46:53 more
  • 怎樣開發直播軟體?直播原始碼禮物功能篇

    如何去開發直播軟體,直播原始碼技術就是其中重要的一環,而直播原始碼技術的功能又是直播軟體開發的重要環節,今天我為大家分享直播原始碼技術功能的禮物實作。 ......

    uj5u.com 2023-05-13 07:46:29 more