聲音好聽,顏值能打,基于PaddleGAN給人工智能AI語音模型配上動態畫面(Python3.10)-有解無憂

借助So-vits我們可以自己訓練五花八門的音色模型，然后復刻想要欣賞的任意歌曲，實作點歌自由，但有時候卻又總覺得少了點什么，沒錯，缺少了畫面，只聞其聲，卻不見其人，本次我們讓AI川普的歌聲和他偉岸的形象同時出現，基于PaddleGAN構建“靚聲靚影”的“懂王”，

PaddlePaddle是百度開源的深度學習框架，其功能包羅萬象，總計覆寫文本、影像、視頻三大領域40個模型，可謂是在深度學習領域無所不窺，

PaddleGAN視覺效果模型中一個子模塊Wav2lip是對開源庫Wav2lip的二次封裝和優化，它實作了人物口型與輸入的歌詞語音同步，說白了就是能讓靜態圖的唇部動起來，讓人物看起來仿佛正在唱歌，

除此以外，Wav2lip還可以直接將動態的視頻，進行唇形替換，輸出與目標語音相匹配的視頻，如此一來，我們就可以通過AI直接定制屬于自己的口播形象了，

本機配置CUDA和cudnn

要想把PaddlePaddle框架在本地跑起來，并非易事，但好在有國內深度學習領域的巨擘百度進行背書，文黨澩非常豐富，只要按部就班，就不會出太大問題，

首先，在本地配置好Python3.10開發環境，參見：一網成擒全端涵蓋，在不同架構(Intel x86/Apple m1 silicon)不同開發平臺(Win10/Win11/Mac/Ubuntu)上安裝配置Python3.10開發環境

隨后，需要在本地配置好CUDA和cudnn，cudnn是基于CUDA的深度學習GPU加速庫，有了它才能在GPU上完成深度學習的計算，它就相當于作業的工具，而CUDA作為計算平臺，就需要cudnn的配合，這倆個在版本上必須配套，

首先點擊N卡控制中心程式，查看本機N卡驅動所支持的CUDA版本：

從圖上可知，筆者的顯卡是RTX4060，當前驅動最大支持CUDA12.1的版本，換句話說只要是小于等于12.1的CUDA就都是支持的，

隨后查看PaddlePaddle框架的官方檔案，查看Python3.10所支持的框架版本：

https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#ciwhls-release

根據檔案可知，對于Python3.10來說，PaddlePaddle最高的支持版本是win-cuda11.6-cudnn8.4-mkl-vs2017-avx，也就是CUDA的版本是11.6，cudnn的版本是8.4，再高就不支持了，

所以本機需要安裝CUDA11.6和cudnn8.4，

注意版本一定要吻合，否則后續無法啟動程式，

知曉了版本號，我們只需要去N卡的官網下載安裝包即可，

CUDA11.6安裝包下載地址：

https://developer.nvidia.com/cuda-toolkit-archive

cudnn8.4安裝包下載地址：

https://developer.nvidia.com/rdp/cudnn-archive

首先安裝CUDA11.6，安裝完成后，解壓cudnn8.4壓縮包，將解壓后的檔案拷貝到CUDA11.6安裝目錄中即可，CUDA安裝路徑是：

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6

隨后需要將bin目錄添加到系統的環境變數中：

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin

接著在終端進入demo檔案夾：

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite

執行bandwidthTest.exe命令，回傳：

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>bandwidthTest.exe  
[CUDA Bandwidth Test] - Starting...  
Running on...  
  
 Device 0: NVIDIA GeForce RTX 4060 Laptop GPU  
 Quick Mode  
  
 Host to Device Bandwidth, 1 Device(s)  
 PINNED Memory Transfers  
   Transfer Size (Bytes)        Bandwidth(MB/s)  
   33554432                     12477.8  
  
 Device to Host Bandwidth, 1 Device(s)  
 PINNED Memory Transfers  
   Transfer Size (Bytes)        Bandwidth(MB/s)  
   33554432                     12337.3  
  
 Device to Device Bandwidth, 1 Device(s)  
 PINNED Memory Transfers  
   Transfer Size (Bytes)        Bandwidth(MB/s)  
   33554432                     179907.9  
  
Result = PASS  
  
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

即代表安裝成功，隨后可通過deviceQuery.exe查詢GPU設備：

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>deviceQuery.exe  
deviceQuery.exe Starting...  
  
 CUDA Device Query (Runtime API) version (CUDART static linking)  
  
Detected 1 CUDA Capable device(s)  
  
Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"  
  CUDA Driver Version / Runtime Version          12.1 / 11.6  
  CUDA Capability Major/Minor version number:    8.9  
  Total amount of global memory:                 8188 MBytes (8585216000 bytes)  
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM  
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM  
  (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores  
  GPU Max Clock rate:                            2370 MHz (2.37 GHz)  
  Memory Clock rate:                             8001 Mhz  
  Memory Bus Width:                              128-bit  
  L2 Cache Size:                                 33554432 bytes  
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)  
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers  
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers  
  Total amount of constant memory:               zu bytes  
  Total amount of shared memory per block:       zu bytes  
  Total number of registers available per block: 65536  
  Warp size:                                     32  
  Maximum number of threads per multiprocessor:  1536  
  Maximum number of threads per block:           1024  
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)  
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)  
  Maximum memory pitch:                          zu bytes  
  Texture alignment:                             zu bytes  
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)  
  Run time limit on kernels:                     Yes  
  Integrated GPU sharing Host Memory:            No  
  Support host page-locked memory mapping:       Yes  
  Alignment requirement for Surfaces:            Yes  
  Device has ECC support:                        Disabled  
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)  
  Device supports Unified Addressing (UVA):      Yes  
  Device supports Compute Preemption:            Yes  
  Supports Cooperative Kernel Launch:            Yes  
  Supports MultiDevice Co-op Kernel Launch:      No  
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0  
  Compute Mode:  
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >  
  
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.1, CUDA Runtime Version = 11.6, NumDevs = 1, Device0 = NVIDIA GeForce RTX 4060 Laptop GPU  
Result = PASS

至此，CUDA和cudnn就配置好了，

配置PaddlePaddle框架

配置好CUDA之后，讓我們來安裝PaddlePaddle框架：

python -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

這里安裝paddlepaddle的gpu版本，版本號是2.4.2.post116，2.4是最新版，其中116就代表Cuda的版本，注意版本一定不能弄錯，

隨后克隆PaddleGan專案：

git clone https://gitee.com/PaddlePaddle/PaddleGAN

運行命令本地編譯安裝PaddleGan專案：

pip install -v -e .

隨后再安裝其他依賴：

pip install -r requirements.txt

這里有幾個坑，需要說明一下：

首先PaddleGan依賴的numpy庫還是老版本，它不支持最新的1.24版本，所以如果您的numpy版本是1.24，需要先把numpy卸載了：

pip uninstall numpy

隨后安裝1.21版本：

pip install numpy==1.21

接著在Python終端中驗證PaddleGan是否安裝成功：

import paddle  
paddle.utils.run_check()

如果報這個錯誤：

PreconditionNotMetError: The third-party dynamic library (cudnn64_7.dll) that Paddle depends on is not configured correctly. (error code is 126)  
      Suggestions:  
      1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.  
      2. Configure third-party dynamic library environment variables as follows:  
      - Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`  
      - Windows: set PATH by `set PATH=XXX; (at ..\paddle\phi\backends\dynload\dynamic_loader.cc:305)  
      [operator < fill_constant > error]

則需要下載cudnn64_7.dll動態庫，然后復制到CUDA11.6的bin目錄中，動態庫地址后面會貼出來，

再次運行驗證程式，回傳：

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32  
Type "help", "copyright", "credits" or "license" for more information.  
>>> import paddle  
>>> paddle.utils.run_check()  
Running verify PaddlePaddle program ...  
W0517 20:15:34.881800 31592 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.6  
W0517 20:15:34.889958 31592 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.  
PaddlePaddle works well on 1 GPU.  
PaddlePaddle works well on 1 GPUs.  
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

說明大功告成，安裝成功，

本地推理

下面我們給川普的歌曲配上動態畫面，首先通過Stable-Diffusion生成一張懂王的靜態圖片：

關于Stable-Diffusion，請移步：人工智能,丹青圣手,全平臺(原生/Docker)構建Stable-Diffusion-Webui的AI繪畫庫教程(Python3.10/Pytorch1.13.0)，囿于篇幅，這里不再贅述，

接著進入到專案的tools目錄：

\PaddleGAN\applications\tools>

將川普的靜態圖片和歌曲檔案放入tools目錄中，

接著運行命令，進行本地推理：

python .\wav2lip.py --face .\Trump.jpg --audio test.wav --outfile pp_put.mp4 --face_enhancement

這里--face是目標圖片，--audio則是需要匹配唇形的歌曲，--outfile引數是輸出視頻，

face_enhancement:引數可以添加人臉增強，不添加引數默認為不使用增強功能，

但添加了這個引數需要單獨下載模型檔案，

Wav2Lip實作唇形與語音精準同步突破的關鍵在于，它采用了唇形同步判別器，以強制生成器持續產生準確而逼真的唇部運動，此外，它通過在鑒別器中使用多個連續幀而不是單個幀，并使用視覺質量損失（而不僅僅是對比損失）來考慮時間相關性，從而改善了視覺質量，

具體效果：

結語

有的時候，人工智能AI技術的發展真的會讓人有一種恍若隔世的感覺，耳聽未必為實，眼見也未必為真，最后，成品視頻可在Youtube平臺(B站)搜索：劉悅的技術博客，歡迎諸君品鑒，本文所有涉及的安裝包和動態庫請參見：

https://pan.baidu.com/s/1-6NA2uAOSRlT4O0FGEKUGA?pwd=oo0d   
提取碼：oo0d

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/552702.html

標籤：其他

上一篇：AI 繪畫 - 如何 0 成本在線體驗 AI 繪畫的魅力

下一篇：返回列表