『語音信號處理』語音庫 librosa 學習-有解無憂

librosa

- 音頻讀取
- 重采樣
- 讀取時長
- 寫音頻
- 過零率
- 波形圖
- 短時傅里葉變換
- 短時傅里葉逆變換
- 幅度轉dB
- 功率轉dB
- 頻譜圖
- Mel濾波器組
- 梅爾頻譜
- 提取MFCC系數

音頻讀取

示例：

data, sr = librosa.load(path, sr=22050, mono=Ture, offset=0.0, duration=None)

引數值：

mono ：bool，是否將信號轉換為單聲道
offset ：float，在此時間之后開始閱讀（以秒為單位）
duration：float，持續時間，僅加載這么多的音頻（以秒為單位）

回傳值：

data ：振幅矩陣，len(data) 為其采樣個數;
sr ：采樣率，記錄聲音檔案時的采樣頻率，如果需要讀取原始采樣率，需要設定引數 sr=None

重采樣

orig_sr = librosa.get_samplerate(path) 	# 讀取采樣率
y_hat = librosa.resample(y, orig_sr, target_sr, fix=True, scale=False)

重新采樣從 orig_sr 到 target_sr 的時間序列

引數：

y ：音頻時間序列，可以是單聲道或立體聲，
orig_sr ：y的原始采樣率
target_sr ：目標采樣率
fix：bool，調整重采樣信號的長度，使其大小恰好為 l e n ( y ) o r i g _ s r ? t a r g e t _ s r = t ? t a r g e t _ s r \frac{len(y)}{orig\_sr}*target\_sr =t*target\_sr orig_srlen(y)??target_sr=t?target_sr
scale：bool，縮放重新采樣的信號，以使 y 和 y_hat 具有大約相等的總能量，

回傳值：

y_hat ：重采樣之后的音頻陣列

讀取時長

t = librosa.get_duration(y=None, sr=22050, S=None, n_fft=2048, hop_length=512, center=True, filename=None)

計算時間序列的的 持續時間（以秒為單位）

引數：

y ：音頻時間序列
sr ：音頻采樣率
S ：STFT矩陣或任何STFT衍生的矩陣（例如，色譜圖或梅爾頻譜圖），根據頻譜圖輸入計算的持續時間僅在達到幀解析度之前才是準確的，如果需要高精度，則最好直接使用音頻時間序列，
n_fft ：S 的 FFT 視窗大小
hop_length ：S列之間的音頻樣本數
center ：bool
- 如果為True，則 S [:, t] 的中心為 y [t * hop_length]
- 如果為False，則 S [:, t] 從 y[t * hop_length] 開始
filename ：如果提供，則所有其他引數都將被忽略，并且持續時間是直接從音頻檔案中計算得出的，

回傳：

t ：持續時間（以秒為單位）

寫音頻

librosa.output.write_wav(path, y, sr, norm=False)

將時間序列輸出為 .wav 檔案

引數：

path：保存輸出 wav 檔案的路徑
y ：音頻時間序列，
sr ：y 的采樣率
norm：bool，是否啟用幅度歸一化，將資料縮放到 [-1，+1] 范圍，

過零率

y, sr = librosa.load(librosa.util.example_audio_file())
print(librosa.feature.zero_crossing_rate(y))
# array([[ 0.134,  0.139, ...,  0.387,  0.322]])

計算音頻時間序列的過零率，

引數：

y ：音頻時間序列
frame_length ：幀長
hop_length ：幀移
center：bool，如果為True，則通過填充 y 的邊緣來使幀居中，

回傳：

zcr：zcr[0，i] 是第 i 幀中的過零率

波形圖

librosa.display.waveplot(y, sr=22050, x_axis='time', offset=0.0, ax=None)

繪制波形的幅度包絡線

引數：

y ：音頻時間序列
sr ：y 的采樣率
x_axis ：str {‘time’，‘off’，‘none’} 或 None，如果為“時間”，則在 x 軸上給定時間刻度線，
offset：水平偏移（以秒為單位）開始波形圖

# 示例
import librosa.display
import matplotlib.pyplot as plt

y, sr = librosa.load(librosa.util.example_audio_file(), duration=10)
librosa.display.waveplot(y, sr=sr)
plt.show()

短時傅里葉變換

librosa.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, pad_mode='reflect')

短時傅立葉變換（STFT），回傳一個復數矩陣使得 D(f, t)

復數的實部：np.abs(D(f, t)) 頻率的振幅
復數的虛部：np.angle(D(f, t)) 頻率的相位

引數：

y：音頻時間序列
n_fft：FFT視窗大小，n_fft = hop_length + overlapping
hop_length：幀移，如果未指定，則默認 win_length / 4，
win_length：每一幀音頻都由 window() 加窗，窗長 win_length，然后用零填充以匹配 N_FFT，
默認 win_length=n_fft，
window：字串，元組，數字，函式 shape =（n_fft, )
- 視窗（字串，元組或數字）；
- 窗函式，例如 scipy.signal.hanning
- 長度為 n_fft 的向量或陣列
center：bool
- 如果為True，則填充信號y，以使幀 D [:, t] 以 y [t * hop_length] 為中心，
- 如果為False，則 D [:, t] 從 y [t * hop_length] 開始
dtype：D的復數值型別，默認值為 64-bit complex 復數
pad_mode：如果 center = True，則在信號的邊緣使用填充模式，默認情況下，STFT使用 reflection padding，

回傳：

STFT矩陣，shape =（1 + n f f t 2 \frac{n_{fft} }{2} 2nfft??，t）

短時傅里葉逆變換

librosa.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, length=None)

短時傅立葉逆變換（ISTFT），將復數值 D(f, t) 頻譜矩陣轉換為時間序列y，窗函式、幀移等引數應與stft相同

引數：

stft_matrix ：經過STFT之后的矩陣
hop_length ：幀移，默認為 w i n l e n g t h 4 \frac{win_{length}}{4} 4winlength??
win_length ：窗長，默認為 n_fft
window：字串，元組，數字，函式或 shape = (n_fft, )
- 視窗（字串，元組或數字）
- 窗函式，例如scipy.signal.hanning
- 長度為 n_fft 的向量或陣列
center：bool
- 如果為 True，則假定D具有居中的幀
- 如果為 False，則假定D具有左對齊的幀
length：如果提供，則輸出y為零填充或剪裁為精確長度音頻

回傳：

y ：時域信號

幅度轉dB

librosa.amplitude_to_db(S, ref=1.0)

將幅度頻譜轉換為dB標度頻譜，也就是對 S 取對數，
與這個函式相反的是 librosa.db_to_amplitude(S)

引數：

S ：輸入幅度
ref ：參考值，振幅 abs（S）相對于 ref 進行縮放， 20 ? l o g 10 ( S r e f ) 20*log_{10}(\frac{S}{ref}) 20?log10?(refS?)

回傳：

dB為單位的S

功率轉dB

librosa.core.power_to_db(S, ref=1.0)

將功率譜（幅度平方）轉換為分貝（dB）單位，
與這個函式相反的是 librosa.db_to_power(S)

引數：

S ：輸入幅度
ref ：參考值，振幅 abs（S）相對于 ref 進行縮放， 10 ? l o g 10 ( S r e f ) 10*log_{10}(\frac{S}{ref}) 10?log10?(refS?)

回傳：

dB為單位的S

頻譜圖

librosa.display.specshow(data, x_axis=None, y_axis=None, sr=22050, hop_length=512)

引數：

data：要顯示的矩陣
sr ：采樣率
hop_length ：幀移
x_axis 、y_axis ：x和y軸的范圍
頻率型別
- ‘linear’，‘fft’，‘hz’：頻率范圍由 FFT 視窗和采樣率確定
- ‘log’：頻譜以對數刻度顯示
- ‘mel’：頻率由mel標度決定
時間型別
- time：標記以毫秒，秒，分鐘或小時顯示，值以秒為單位繪制，
- s：標記顯示為秒，
- ms：標記以毫秒為單位顯示，
所有頻率型別均以Hz為單位繪制

示例：

import librosa.display
import numpy as np
import matplotlib.pyplot as plt

y, sr = librosa.load(librosa.util.example_audio_file())
plt.figure()

D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)	# 將振幅譜圖轉換為 db_scale 譜圖
plt.subplot(2, 1, 1)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('線性頻率功率譜')

plt.subplot(2, 1, 2)
librosa.display.specshow(D, y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('對數頻率功率譜')
plt.show()

Mel濾波器組

librosa.filters.mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False, norm=1)

創建一個濾波器組矩陣以將 FFT 合并成 Mel 頻率

引數：

sr ：輸入信號的采樣率
n_fft ：FFT組件數
n_mels ：產生的梅爾帶數
fmin ：最低頻率（Hz）
fmax：最高頻率（以Hz為單位），如果為 None，則使用 fmax = sr / 2.0
norm：{None，1，np.inf} [標量]
- 如果為1，則將三角 mel 權重除以mel帶的寬度（區域歸一化），
- 否則，保留所有三角形的峰值為1.0

回傳： Mel變換矩陣

melfb = librosa.filters.mel(22050, 2048)
# array([[ 0.   ,  0.016, ...,  0.   ,  0.   ],
#        [ 0.   ,  0.   , ...,  0.   ,  0.   ],
#        ...,
#        [ 0.   ,  0.   , ...,  0.   ,  0.   ],
#        [ 0.   ,  0.   , ...,  0.   ,  0.   ]])
import matplotlib.pyplot as plt
plt.figure()
librosa.display.specshow(melfb, x_axis='linear')
plt.ylabel('Mel filter')
plt.title('Mel filter bank')
plt.colorbar()
plt.tight_layout()
plt.show()

梅爾頻譜

librosa.feature.melspectrogram(audio, sr=40000, n_fft=1480, hop_length=150, n_mels=256)

提供了時間序列 audio，sr，首先計算其幅值頻譜S，然后通過 mel_f.dot（S ** power）將其映射到 mel scale上，
默認情況下，power=2 在功率譜上運行，

引數：

n_mels : 梅爾濾波器的數目
sr : 采樣率
n_fft : 視窗大小
power : 幅度譜的指數，例如1代表能量，2代表功率，等等
hop_length : 幀移
win_length : 視窗的長度為 win_length，默認win_length = n_fft
fmax ：最高頻率

示例：

import librosa.display
import numpy as np
import matplotlib.pyplot as plt

y, sr = librosa.load(librosa.util.example_audio_file())

# 方法一：使用時間序列求Mel頻譜
print(librosa.feature.melspectrogram(y=y, sr=sr))
# array([[  2.891e-07,   2.548e-03, ...,   8.116e-09,   5.633e-09],
#        [  1.986e-07,   1.162e-02, ...,   9.332e-08,   6.716e-09],
#        ...,
#        [  3.668e-09,   2.029e-08, ...,   3.208e-09,   2.864e-09],
#        [  2.561e-10,   2.096e-09, ...,   7.543e-10,   6.101e-10]])

# 方法二：使用stft頻譜求Mel頻譜
D = np.abs(librosa.stft(y)) ** 2  			# stft頻譜
S = librosa.feature.melspectrogram(S=D)  	# 使用stft頻譜求Mel頻譜

plt.figure(figsize=(10, 4))
librosa.display.specshow(librosa.power_to_db(S, ref=np.max), 
						 y_axis='mel', fmax=8000, x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel spectrogram')
plt.tight_layout()
plt.show()

提取MFCC系數

MFCC 特征是一種在自動語音識別和說話人識別中廣泛使用的特征，關于MFCC特征的詳細資訊，有興趣的可以參考博客http:// blog.csdn.net/zzc15806/article/details/79246716，在librosa中，提取MFCC特征只需要一個函式：

librosa.feature.mfcc(y=None, sr=22050, S=None, n_mfcc=20, dct_type=2, norm='ortho', **kwargs)

引數：

y：音頻資料
sr：采樣率
S：np.ndarray，對數功能梅爾譜圖
n_mfcc：int>0，要回傳的MFCC數量
dct_type：None, or {1, 2, 3} 離散余弦變換（DCT）型別，默認情況下，使用DCT型別2，
norm： None or ‘ortho’ 規范，
- 如果 dct_type 為 2 或 3，則設定 norm =‘ortho’ 使用正交 DCT 基礎，
- 標準化不支持 dct_type = 1，

回傳：

M： MFCC序列

import librosa

y, sr = librosa.load('./train_nb.wav', sr=16000)
# 提取 MFCC feature
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=40)

print(mfccs.shape)        # (40, 65)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/295283.html

標籤：其他

上一篇：Python OpenCV5：OpenCV 影像平滑

下一篇：【深度好文】Python影像處理二值影像投影量的計算和應用