主頁 >  其他 > 語音質量的評價指標介紹及MATLAB實作(二)

語音質量的評價指標介紹及MATLAB實作(二)

2021-12-22 07:10:48 其他

對數似然比(log-likelihood ratio, LLR)測度

假設語音在短時間內可以表示為一個 p p p階全極點的模型,即
x ( n ) = ∑ i = 1 p a x ( i ) x ( n ? i ) + G x u ( n ) x\left( n \right)=\sum\limits_{i=1}^{p}{{{a}_{x}}\left( i \right)x\left( n-i \right)}+{{G}_{x}}u\left( n \right) x(n)=i=1p?ax?(i)x(n?i)+Gx?u(n)
其中, a x ( i ) {{a}_{x}}\left( i \right) ax?(i)為濾波器系數, G x {{G}_{x}} Gx?表示濾波器的增益大小, u ( n ) u\left( n \right) u(n)表示單位方差的白噪聲,則,對數似然比(log-likelihood ratio, LLR)測度可以表示為
d L L R ( a x , a ˉ x ^ ) = log ? a ˉ x ^ T R x a ˉ x ^ a x T R x a x {{d}_{LLR}}\left( {{\mathbf{a}}_{x}},{{{\mathbf{\bar{a}}}}_{{\hat{x}}}} \right)=\log \frac{\mathbf{\bar{a}}_{{\hat{x}}}^{T}{{\mathbf{R}}_{x}}{{{\mathbf{\bar{a}}}}_{{\hat{x}}}}}{\mathbf{a}_{x}^{T}{{\mathbf{R}}_{x}}{{\mathbf{a}}_{x}}} dLLR?(ax?,aˉx^?)=logaxT?Rx?ax?aˉx^T?Rx?aˉx^??
其中, a ˉ x = [ 1 , ? α x ( 1 ) , ? α x ( 2 ) , ? ? , ? α x ( p ) ] T {{\mathbf{\bar{a}}}_{x}}={{\left[ 1,-{{\alpha }_{x}}\left( 1 \right),-{{\alpha }_{x}}\left( 2 \right),\cdots ,-{{\alpha }_{x}}\left( p \right) \right]}^{T}} aˉx?=[1,?αx?(1),?αx?(2),?,?αx?(p)]T表示干凈語音的LPC系數, a ˉ x ^ = [ 1 , ? α x ^ ( 1 ) , ? α x ^ ( 2 ) , ? ? , ? α x ^ ( p ) ] T {{\mathbf{\bar{a}}}_{{\hat{x}}}}={{\left[ 1,-{{\alpha }_{{\hat{x}}}}\left( 1 \right),-{{\alpha }_{{\hat{x}}}}\left( 2 \right),\cdots ,-{{\alpha }_{{\hat{x}}}}\left( p \right) \right]}^{T}} aˉx^?=[1,?αx^?(1),?αx^?(2),?,?αx^?(p)]T表示增強語音的LPC系數, R x {{\mathbf{R}}_{x}} Rx?表示干凈語音的自相關矩陣,

代碼如下:

function llr_mean= comp_llr(cleanFile, enhancedFile);

% ----------------------------------------------------------------------
%
%      Log Likelihood Ratio (LLR) Objective Speech Quality Measure
%
%
%     This function implements the Log Likelihood Ratio Measure
%     defined on page 48 of [1] (see Equation 2.18).
%
%   Usage:  llr=comp_llr(cleanFile.wav, enhancedFile.wav)
%
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         llr           - computed likelihood ratio
%
%         Note that the LLR measure is limited in the range [0, 2].
%
%  Example call:  llr =comp_llr('sp04.wav','enhanced.wav')
%
%
%  References:
%
%     [1] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
%	    Objective Measures of Speech Quality.  Prentice Hall
%	    Advanced Reference Series, Englewood Cliffs, NJ, 1988,
%	    ISBN: 0-13-629056-6.
%
%  Authors: Bryan L. Pellom and John H. L. Hansen (July 1998)
%  Modified by: Philipos C. Loizou  (Oct 2006) - limited LLR to be in [0,2]
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $
% ----------------------------------------------------------------------

if nargin~=2
    fprintf('USAGE: LLR=comp_llr(cleanFile.wav, enhancedFile.wav)\n');
    fprintf('For more help, type: help comp_llr\n\n');
    return;
end

alpha=0.95;
[data1, fs1]= audioread(cleanFile);
[data2, fs2]= audioread(enhancedFile);
if ( fs1~= fs2)
    error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

IS_dist= llr( data1, data2,fs1);

IS_len= round( length( IS_dist)* alpha);
IS= sort( IS_dist);

llr_mean= mean( IS( 1: IS_len));

end

function distortion = llr(clean_speech, processed_speech,sample_rate)


% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
    disp('Error: Both Speech Files must be same length.');
    return
end

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
if sample_rate<10000
    P           = 10;		   % LPC Analysis Order
else
    P=16;     % this could vary depending on sampling frequency.
end
% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Log Likelihood Ratio
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames
    
    % ----------------------------------------------------------
    % (1) Get the Frames for the test and reference speech.
    %     Multiply by Hanning Window.
    % ----------------------------------------------------------
    
    clean_frame = clean_speech(start:start+winlength-1);
    processed_frame = processed_speech(start:start+winlength-1);
    clean_frame = clean_frame.*window;
    processed_frame = processed_frame.*window;
    
    % ----------------------------------------------------------
    % (2) Get the autocorrelation lags and LPC parameters used
    %     to compute the LLR measure.
    % ----------------------------------------------------------
    
    [R_clean, Ref_clean, A_clean] = ...
        lpcoeff(clean_frame, P);
    [R_processed, Ref_processed, A_processed] = ...
        lpcoeff(processed_frame, P);
    
    % ----------------------------------------------------------
    % (3) Compute the LLR measure
    % ----------------------------------------------------------
    
    numerator   = A_processed*toeplitz(R_clean)*A_processed';
    denominator = A_clean*toeplitz(R_clean)*A_clean';
    distortion(frame_count) = min(2,log(numerator/denominator));
    start = start + skiprate;
    
end
end

function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

% ----------------------------------------------------------
% (1) Compute Autocorrelation Lags
% ----------------------------------------------------------

winlength = max(size(speech_frame));
for k=1:model_order+1
    R(k) = sum(speech_frame(1:winlength-k+1) ...
        .*speech_frame(k:winlength));
end

% ----------------------------------------------------------
% (2) Levinson-Durbin
% ----------------------------------------------------------

a = ones(1,model_order);
E(1)=R(1);
for i=1:model_order
    a_past(1:i-1) = a(1:i-1);
    sum_term = sum(a_past(1:i-1).*R(i:-1:2));
    rcoeff(i)=(R(i+1) - sum_term) / E(i);
    a(i)=rcoeff(i);
    a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
    E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
end

acorr    = R;
refcoeff = rcoeff;
lpparams = [1 -a];
end

Itakura-Satio(IS)測度

Itakura-Satio(IS)測度可以表示為
d I S ( a x , a ˉ x ^ ) = G x a ˉ x ^ T R x a ˉ x ^ G ˉ x ^ a x T R x a x + log ? ( G ˉ x ^ G x ) ? 1 {{d}_{IS}}\left( {{\mathbf{a}}_{x}},{{{\mathbf{\bar{a}}}}_{{\hat{x}}}} \right)=\frac{{{G}_{x}}\mathbf{\bar{a}}_{{\hat{x}}}^{T}{{\mathbf{R}}_{x}}{{{\mathbf{\bar{a}}}}_{{\hat{x}}}}}{{{{\bar{G}}}_{{\hat{x}}}}\mathbf{a}_{x}^{T}{{\mathbf{R}}_{x}}{{\mathbf{a}}_{x}}}+\log \left( \frac{{{{\bar{G}}}_{{\hat{x}}}}}{{{G}_{x}}} \right)-1 dIS?(ax?,aˉx^?)=Gˉx^?axT?Rx?ax?Gx?aˉx^T?Rx?aˉx^??+log(Gx?Gˉx^??)?1
其中, G x {{G}_{x}} Gx?表示干凈語音的全極點增益, G ˉ x ^ {{\bar{G}}_{{\hat{x}}}} Gˉx^?表示增強語音的全極點增益,其中 G x {{G}_{x}} Gx?可以表示為
G x = ( r x T a x ) 1 / 2 {{G}_{x}}\text{=}{{\left( \mathbf{r}_{x}^{T}{{\mathbf{a}}_{x}} \right)}^{1/2}} Gx?=(rxT?ax?)1/2
其中, r x = [ r x ( 0 ) , r x ( 1 ) , ? ? , r x ( p ) ] T {{\mathbf{r}}_{x}}={{\left[ {{r}_{x}}\left( 0 \right),{{r}_{x}}\left( 1 \right),\cdots ,{{r}_{x}}\left( p \right) \right]}^{T}} rx?=[rx?(0),rx?(1),?,rx?(p)]T表示干凈語音的自相關,也就是 R x {{\mathbf{R}}_{x}} Rx?的第一行,從中可以看出,IS測度表示的是全極點的增益之差,但這也是IS測度的缺點,因為幅度差異對音質的影響最小,

代碼如下:

function is_mean= comp_is(cleanFile, enhancedFile);
% ----------------------------------------------------------------------
%          Itakura-Saito (IS) Objective Speech Quality Measure
%
%   This function implements the Itakura-Saito distance measure
%   defined on page 50 of [1] (see Equation 2.26).  See also
%   Equation 12 (page 1480) of [2].
%
%   Usage:  IS=comp_is(cleanFile.wav, enhancedFile.wav)
%           
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         IS            - computed Itakura Saito measure
% 
%         Note that the IS measure is limited in the range [0, 100].
%
%  Example call:  IS =comp_is('sp04.wav','enhanced.wav')
%
%  
%  References:
%
%     [1] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
%	    Objective Measures of Speech Quality.  Prentice Hall
%	    Advanced Reference Series, Englewood Cliffs, NJ, 1988,
%	    ISBN: 0-13-629056-6.
%
%     [2] B.-H. Juang, "On Using the Itakura-Saito Measures for
%           Speech Coder Performance Evaluation", AT&T Bell
%  	    Laboratories Technical Journal, Vol. 63, No. 8,
%	    October 1984, pp. 1477-1498.
%
%  Authors: Bryan L. Pellom and John H. L. Hansen (July 1998)
%  Modified by: Philipos C. Loizou  (Oct 2006) - limited IS to be in [0,100]
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $

% ----------------------------------------------------------------------

if nargin~=2
    fprintf('USAGE: IS=comp_is(cleanFile.wav, enhancedFile.wav)\n');
    fprintf('For more help, type: help comp_is\n\n');
    return;
end

alpha=0.95;

[data1, fs1]= audioread(cleanFile);
[data2, fs2]= audioread(enhancedFile);
if ( fs1~= fs2)
    error( 'The two files do not match!\n');
end
    
len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;


IS_dist= is( data1, data2,fs1);

IS_len= round( length( IS_dist)* alpha);
IS= sort( IS_dist);

is_mean= mean( IS( 1: IS_len));
end


function distortion = is(clean_speech, processed_speech,sample_rate)


% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
  disp('Error: Both Speech Files must be same length.');
  return
end

% ----------------------------------------------------------------------
% Scale both clean speech and processed speech to have same dynamic
% range.  Also remove DC component from each signal
% ----------------------------------------------------------------------

%clean_speech     = clean_speech     - mean(clean_speech);
%processed_speech = processed_speech - mean(processed_speech);

%processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

%sample_rate = 8000;		   % default sample rate
winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
if sample_rate<10000
   P           = 10;		   % LPC Analysis Order
else
    P=16;     % this could vary depending on sampling frequency.
end
% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Itakura-Saito Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

   % ----------------------------------------------------------
   % (1) Get the Frames for the test and reference speech. 
   %     Multiply by Hanning Window.
   % ----------------------------------------------------------

   clean_frame = clean_speech(start:start+winlength-1);
   processed_frame = processed_speech(start:start+winlength-1);
   clean_frame = clean_frame.*window;
   processed_frame = processed_frame.*window;

   % ----------------------------------------------------------
   % (2) Get the autocorrelation lags and LPC parameters used
   %     to compute the IS measure.
   % ----------------------------------------------------------

   [R_clean, Ref_clean, A_clean] = ...
      lpcoeff(clean_frame, P);
   [R_processed, Ref_processed, A_processed] = ...
      lpcoeff(processed_frame, P);

  
   % ----------------------------------------------------------
   % (3) Compute the IS measure
   % ----------------------------------------------------------

   numerator      = A_processed*toeplitz(R_clean)*A_processed';
   denominator    = max(A_clean*toeplitz(R_clean)*A_clean',eps);
   gain_clean     = max(R_clean*A_clean',eps);	      % this is gain
   gain_processed = max(R_processed*A_processed',eps); % squared (sigma^2)

   
     ISvalue=(gain_clean/gain_processed)*(numerator/denominator) + ...
      log(gain_processed/gain_clean)-1; 

  distortion(frame_count) = min(ISvalue,100);
   start = start + skiprate;

end
end


function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

   % ----------------------------------------------------------
   % (1) Compute Autocorrelation Lags
   % ----------------------------------------------------------

   winlength = max(size(speech_frame));
   for k=1:model_order+1
      R(k) = sum(speech_frame(1:winlength-k+1) ...
		     .*speech_frame(k:winlength));
   end

   % ----------------------------------------------------------
   % (2) Levinson-Durbin
   % ----------------------------------------------------------

   a = ones(1,model_order);
   E(1)=R(1);
   for i=1:model_order
      a_past(1:i-1) = a(1:i-1);
      sum_term = sum(a_past(1:i-1).*R(i:-1:2));
      rcoeff(i)=(R(i+1) - sum_term) / E(i);
      a(i)=rcoeff(i);
      a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
      E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
   end

   acorr    = R;
   refcoeff = rcoeff;
   lpparams = [1 -a];
end

倒譜距離(Cepstrum Distance)測度

倒譜系數可以表示為
c ( m ) = a m + ∑ k = 1 m ? 1 k m c ( k ) a m ? k , 1 ≤ m ≤ p c\left( m \right)={{a}_{m}}+\sum\limits_{k=1}^{m-1}{\frac{k}{m}c\left( k \right){{a}_{m-k}}}\ \ \ \ \ \ ,1\le m\le p c(m)=am?+k=1m?1?mk?c(k)am?k? ,1mp
其中, a j a_{j} aj?表示LPC系數,
那么一種倒譜距離測度可以表示為
d c e p ( c x , c ˉ x ^ ) = 10 ln ? 10 2 ∑ k = 1 p [ c x ( k ) ? c x ^ ( k ) ] 2 {{d}_{cep}}\left( {{\mathbf{c}}_{x}},{{{\mathbf{\bar{c}}}}_{{\hat{x}}}} \right)=\frac{10}{\ln 10}\sqrt{2\sum\limits_{k=1}^{p}{{{\left[ {{c}_{x}}\left( k \right)-{{c}_{{\hat{x}}}}\left( k \right) \right]}^{2}}}} dcep?(cx?,cˉx^?)=ln1010?2k=1p?[cx?(k)?cx^?(k)]2 ?
其中, c x ( k ) {{c}_{x}}\left( k \right) cx?(k)表示干凈語音的倒譜系數, c x ^ ( k ) {{c}_{{\hat{x}}}}\left( k \right) cx^?(k)表示增強語音的倒譜系數,

代碼如下:

function cep_mean= comp_cep(cleanFile, enhancedFile);

% ----------------------------------------------------------------------
%          Cepstrum Distance Objective Speech Quality Measure
%
%   This function implements the cepstrum distance measure used
%   in [1]
%
%   Usage:  CEP=comp_cep(cleanFile.wav, enhancedFile.wav)
%           
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         CEP           - computed cepstrum distance measure
% 
%         Note that the cepstrum measure is limited in the range [0, 10].
%
%  Example call:  CEP =comp_cep('sp04.wav','enhanced.wav')
%
%  
%  References:
%
%     [1]	Kitawaki, N., Nagabuchi, H., and Itoh, K. (1988). Objective quality
%           evaluation for low bit-rate speech coding systems. IEEE J. Select.
%           Areas in Comm., 6(2), 262-273.
%
%  Author: Philipos C. Loizou 
%  (LPC routines were written by Bryan Pellom & John Hansen)
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $

% ----------------------------------------------------------------------
if nargin~=2
    fprintf('USAGE: CEP=comp_cep(cleanFile.wav, enhancedFile.wav)\n');
    fprintf('For more help, type: help comp_cep\n\n');
    return;
end

alpha=0.95;

[data1, fs1]= audioread(cleanFile);
[data2, fs2]= audioread(enhancedFile);
if ( fs1~= fs2)
    error( 'The two files do not match!\n');
end
    
len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

IS_dist= cepstrum( data1, data2,fs1);

IS_len= round( length( IS_dist)* alpha);
IS= sort( IS_dist);

cep_mean= mean( IS( 1: IS_len)); 
end



function distortion = cepstrum(clean_speech, processed_speech,sample_rate)


% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
  disp('Error: Both Speech Files must be same length.');
  return
end

% ----------------------------------------------------------------------
% Scale both clean speech and processed speech to have same dynamic
% range.  Also remove DC component from each signal
% ----------------------------------------------------------------------

%clean_speech     = clean_speech     - mean(clean_speech);
%processed_speech = processed_speech - mean(processed_speech);

%processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
if sample_rate<10000
   P           = 10;		   % LPC Analysis Order
else
    P=16;     % this could vary depending on sampling frequency.
end
C=10*sqrt(2)/log(10);
% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Itakura-Saito Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

   % ----------------------------------------------------------
   % (1) Get the Frames for the test and reference speech. 
   %     Multiply by Hanning Window.
   % ----------------------------------------------------------

   clean_frame = clean_speech(start:start+winlength-1);
   processed_frame = processed_speech(start:start+winlength-1);
   clean_frame = clean_frame.*window;
   processed_frame = processed_frame.*window;

   % ----------------------------------------------------------
   % (2) Get the autocorrelation lags and LPC parameters used
   %     to compute the IS measure.
   % ----------------------------------------------------------

   [R_clean, Ref_clean, A_clean] = ...
      lpcoeff(clean_frame, P);
   [R_processed, Ref_processed, A_processed] = ...
      lpcoeff(processed_frame, P);

  C_clean=lpc2cep(A_clean);
  C_processed=lpc2cep(A_processed);
  
   % ----------------------------------------------------------
   % (3) Compute the cepstrum-distance measure
   % ----------------------------------------------------------

  
   distortion(frame_count) = min(10,C*norm(C_clean-C_processed,2)); 
   

   start = start + skiprate;

end
end


function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

   % ----------------------------------------------------------
   % (1) Compute Autocorrelation Lags
   % ----------------------------------------------------------

   winlength = max(size(speech_frame));
   for k=1:model_order+1
      R(k) = sum(speech_frame(1:winlength-k+1) ...
		     .*speech_frame(k:winlength));
   end

   % ----------------------------------------------------------
   % (2) Levinson-Durbin
   % ----------------------------------------------------------

   a = ones(1,model_order);
   E(1)=R(1);
   for i=1:model_order
      a_past(1:i-1) = a(1:i-1);
      sum_term = sum(a_past(1:i-1).*R(i:-1:2));
      rcoeff(i)=(R(i+1) - sum_term) / E(i);
      a(i)=rcoeff(i);
      a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
      E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
   end

   acorr    = R;
   refcoeff = rcoeff;
   lpparams = [1 -a];
end
%----------------------------------------------
function [cep]=lpc2cep(a)
%
% converts prediction to cepstrum coefficients
%
% Author: Philipos C. Loizou

M=length(a);
cep=zeros(1,M-1);

cep(1)=-a(2);

for k=2:M-1
    ix=1:k-1;
    vec1=cep(ix).*a(k-1+1:-1:2).*ix;
    cep(k)=-(a(k+1)+sum(vec1)/k);
    
end
end

加權譜斜率(Weighted Spectral Slope, WSS)測度

該測度是基于每個頻帶的頻譜斜率之間的加權差值進行計算的,首先找到每個頻帶的譜斜率,其中頻譜斜率可以表示為
{ S x ( k ) = C x ( k + 1 ) ? C x ( k ) S ˉ x ^ ( k ) = C ˉ x ^ ( k + 1 ) ? C ˉ x ^ ( k ) \left\{ \begin{aligned} & {{S}_{x}}\left( k \right)={{C}_{x}}\left( k+1 \right)-{{C}_{x}}\left( k \right) \\ & {{{\bar{S}}}_{{\hat{x}}}}\left( k \right)={{{\bar{C}}}_{{\hat{x}}}}\left( k+1 \right)-{{{\bar{C}}}_{{\hat{x}}}}\left( k \right) \\ \end{aligned} \right. {?Sx?(k)=Cx?(k+1)?Cx?(k)Sˉx^?(k)=Cˉx^?(k+1)?Cˉx^?(k)?
其中, S x ( k ) {{S}_{x}}\left( k \right) Sx?(k) S ˉ x ^ ( k ) {{\bar{S}}_{{\hat{x}}}}\left( k \right) Sˉx^?(k)分別表示第 k k k個頻帶的干凈語音和增強語音的頻譜斜率, C x ( k ) {{C}_{x}}\left( k \right) Cx?(k) C x ^ ( k ) {{C}_{{\hat{x}}}}\left( k \right) Cx^?(k)分別表示干凈語音和增強語音的關鍵頻譜,根據其是否靠近譜峰或者譜谷以及峰值是否對應頻譜的最大峰值,對頻譜斜率之間的差值進行加權,其中第 k k k個頻帶的權重可以表示為
W ( k ) = K max ? [ K max ? + C max ? ? C x ( k ) ] K l o c max ? [ K l o c max ? + C l o c max ? ? C x ( k ) ] W\left( k \right)=\frac{{{K}_{\max }}}{\left[ {{K}_{\max }}+{{C}_{\max }}-{{C}_{x}}\left( k \right) \right]}\frac{{{K}_{loc\max }}}{\left[ {{K}_{loc\max }}+{{C}_{loc\max }}-{{C}_{x}}\left( k \right) \right]} W(k)=[Kmax?+Cmax??Cx?(k)]Kmax??[Klocmax?+Clocmax??Cx?(k)]Klocmax??
其中, C max ? {{C}_{\max }} Cmax?表示各個頻帶中最大的對數譜幅度, C l o c max ? {{C}_{loc\max }} Clocmax?表示最靠近第 k k k個頻帶的普峰值, K max ? {{K}_{\max }} Kmax? K l o c max ? {{K}_{loc\max }} Klocmax?均為常量,可以通過調整使得主觀聽音測驗結果與客觀測度之間的相關性最大,一般, K max ? = 20 {{K}_{\max }}=20 Kmax?=20 K l o c max ? = 1 {{K}_{loc\max }}=1 Klocmax?=1時,具有高的相關性,

加權譜斜率(Weighted Spectral Slope, WSS)測度最終可以表示為
d W S S = ( C x , C ˉ x ) = ∑ k = 1 36 W ( k ) ( S x ( k ) ? S ˉ x ^ ( k ) ) 2 {{d}_{WSS}}=\left( {{C}_{x}},{{{\bar{C}}}_{x}} \right)=\sum\limits_{k=1}^{36}{W\left( k \right){{\left( {{S}_{x}}\left( k \right)-{{{\bar{S}}}_{{\hat{x}}}}\left( k \right) \right)}^{2}}} dWSS?=(Cx?,Cˉx?)=k=136?W(k)(Sx?(k)?Sˉx^?(k))2
平均的WSS測度則是由所有幀的WSS平均得到,這里就不再敘述,

代碼如下:

function wss_dist= comp_wss(cleanFile, enhancedFile);
% ----------------------------------------------------------------------
%
%     Weighted Spectral Slope (WSS) Objective Speech Quality Measure
%
%     This function implements the Weighted Spectral Slope (WSS)
%     distance measure originally proposed in [1].  The algorithm
%     works by first decomposing the speech signal into a set of
%     frequency bands (this is done for both the test and reference
%     frame).  The intensities within each critical band are 
%     measured.  Then, a weighted distances between the measured
%     slopes of the log-critical band spectra are computed.  
%     This measure is also described in Section 2.2.9 (pages 56-58)
%     of [2].
%
%     Whereas Klatt's original measure used 36 critical-band 
%     filters to estimate the smoothed short-time spectrum, this
%     implementation considers a bank of 25 filters spanning 
%     the 4 kHz bandwidth.  
%
%   Usage:  wss_dist=comp_wss(cleanFile.wav, enhancedFile.wav)
%           
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         wss_dist      - computed spectral slope distance
%
%  Example call:  ws =comp_wss('sp04.wav','enhanced.wav')
%
%  References:
%
%     [1] D. H. Klatt, "Prediction of Perceived Phonetic Distance
%	    from Critical-Band Spectra: A First Step", Proc. IEEE
%	    ICASSP'82, Volume 2, pp. 1278-1281, May, 1982.
%
%     [2] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
%	    Objective Measures of Speech Quality.  Prentice Hall
%	    Advanced Reference Series, Englewood Cliffs, NJ, 1988,
%	    ISBN: 0-13-629056-6.
%
%  Authors: Bryan L. Pellom and John H. L. Hansen (July 1998)
%  Modified by: Philipos C. Loizou  (Oct 2006)
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $
%
% ----------------------------------------------------------------------
if nargin~=2
    fprintf('USAGE: WSS=comp_wss(cleanFile.wav, enhancedFile.wav)\n');
    fprintf('For more help, type: help comp_wss\n\n');
    return;
end

alpha= 0.95;

[data1, fs1]= audioread(cleanFile);
[data2, fs2]= audioread(enhancedFile);
if ( fs1~= fs2)
    error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

wss_dist_vec= wss( data1, data2,fs1);
wss_dist_vec= sort( wss_dist_vec);
wss_dist= mean( wss_dist_vec( 1: round( length( wss_dist_vec)*alpha)));



function distortion = wss(clean_speech, processed_speech,sample_rate)


% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
  disp('Error: Files  musthave same length.');
  return
end



% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); 	   % window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
max_freq    = sample_rate/2;	   % maximum bandwidth
num_crit    = 25;		   % number of critical bands

USE_FFT_SPECTRUM = 1;		   % defaults to 10th order LP spectrum
n_fft       = 2^nextpow2(2*winlength);
n_fftby2    = n_fft/2;		   % FFT size/2
Kmax        = 20;		   % value suggested by Klatt, pg 1280
Klocmax     = 1;		   % value suggested by Klatt, pg 1280		

% ----------------------------------------------------------------------
% Critical Band Filter Definitions (Center Frequency and Bandwidths in Hz)
% ----------------------------------------------------------------------

cent_freq(1)  = 50.0000;   bandwidth(1)  = 70.0000;
cent_freq(2)  = 120.000;   bandwidth(2)  = 70.0000;
cent_freq(3)  = 190.000;   bandwidth(3)  = 70.0000;
cent_freq(4)  = 260.000;   bandwidth(4)  = 70.0000;
cent_freq(5)  = 330.000;   bandwidth(5)  = 70.0000;
cent_freq(6)  = 400.000;   bandwidth(6)  = 70.0000;
cent_freq(7)  = 470.000;   bandwidth(7)  = 70.0000;
cent_freq(8)  = 540.000;   bandwidth(8)  = 77.3724;
cent_freq(9)  = 617.372;   bandwidth(9)  = 86.0056;
cent_freq(10) = 703.378;   bandwidth(10) = 95.3398;
cent_freq(11) = 798.717;   bandwidth(11) = 105.411;
cent_freq(12) = 904.128;   bandwidth(12) = 116.256;
cent_freq(13) = 1020.38;   bandwidth(13) = 127.914;
cent_freq(14) = 1148.30;   bandwidth(14) = 140.423;
cent_freq(15) = 1288.72;   bandwidth(15) = 153.823;
cent_freq(16) = 1442.54;   bandwidth(16) = 168.154;
cent_freq(17) = 1610.70;   bandwidth(17) = 183.457;
cent_freq(18) = 1794.16;   bandwidth(18) = 199.776;
cent_freq(19) = 1993.93;   bandwidth(19) = 217.153;
cent_freq(20) = 2211.08;   bandwidth(20) = 235.631;
cent_freq(21) = 2446.71;   bandwidth(21) = 255.255;
cent_freq(22) = 2701.97;   bandwidth(22) = 276.072;
cent_freq(23) = 2978.04;   bandwidth(23) = 298.126;
cent_freq(24) = 3276.17;   bandwidth(24) = 321.465;
cent_freq(25) = 3597.63;   bandwidth(25) = 346.136;

bw_min      = bandwidth (1);	   % minimum critical bandwidth

% ----------------------------------------------------------------------
% Set up the critical band filters.  Note here that Gaussianly shaped
% filters are used.  Also, the sum of the filter weights are equivalent
% for each critical band filter.  Filter less than -30 dB and set to
% zero.
% ----------------------------------------------------------------------

min_factor = exp (-30.0 / (2.0 * 2.303));       % -30 dB point of filter

for i = 1:num_crit
  f0 = (cent_freq (i) / max_freq) * (n_fftby2);
  all_f0(i) = floor(f0);
  bw = (bandwidth (i) / max_freq) * (n_fftby2);
  norm_factor = log(bw_min) - log(bandwidth(i));
  j = 0:1:n_fftby2-1;
  crit_filter(i,:) = exp (-11 *(((j - floor(f0)) ./bw).^2) + norm_factor);
  crit_filter(i,:) = crit_filter(i,:).*(crit_filter(i,:) > min_factor);
end   

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Weighted Spectral
% Slope Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

   % ----------------------------------------------------------
   % (1) Get the Frames for the test and reference speech. 
   %     Multiply by Hanning Window.
   % ----------------------------------------------------------

   clean_frame = clean_speech(start:start+winlength-1);
   processed_frame = processed_speech(start:start+winlength-1);
   clean_frame = clean_frame.*window;
   processed_frame = processed_frame.*window;

   % ----------------------------------------------------------
   % (2) Compute the Power Spectrum of Clean and Processed
   % ----------------------------------------------------------

    if (USE_FFT_SPECTRUM)
       clean_spec     = (abs(fft(clean_frame,n_fft)).^2);
       processed_spec = (abs(fft(processed_frame,n_fft)).^2);
    else
       a_vec = zeros(1,n_fft);
       a_vec(1:11) = lpc(clean_frame,10);
       clean_spec     = 1.0/(abs(fft(a_vec,n_fft)).^2)';

       a_vec = zeros(1,n_fft);
       a_vec(1:11) = lpc(processed_frame,10);
       processed_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)';
    end

   % ----------------------------------------------------------
   % (3) Compute Filterbank Output Energies (in dB scale)
   % ----------------------------------------------------------
 
   for i = 1:num_crit
      clean_energy(i) = sum(clean_spec(1:n_fftby2) ...
		            .*crit_filter(i,:)');
      processed_energy(i) = sum(processed_spec(1:n_fftby2) ...
			        .*crit_filter(i,:)');
   end
   clean_energy = 10*log10(max(clean_energy,1E-10));
   processed_energy = 10*log10(max(processed_energy,1E-10));

   % ----------------------------------------------------------
   % (4) Compute Spectral Slope (dB[i+1]-dB[i]) 
   % ----------------------------------------------------------

   clean_slope     = clean_energy(2:num_crit) - ...
		     clean_energy(1:num_crit-1);
   processed_slope = processed_energy(2:num_crit) - ...
		     processed_energy(1:num_crit-1);

   % ----------------------------------------------------------
   % (5) Find the nearest peak locations in the spectra to 
   %     each critical band.  If the slope is negative, we 
   %     search to the left.  If positive, we search to the 
   %     right.
   % ----------------------------------------------------------

   for i = 1:num_crit-1

       % find the peaks in the clean speech signal
	
       if (clean_slope(i)>0) 		% search to the right
	  n = i;
          while ((n<num_crit) & (clean_slope(n) > 0))
	     n = n+1;
 	  end
	  clean_loc_peak(i) = clean_energy(n-1);
       else				% search to the left
          n = i;
	  while ((n>0) & (clean_slope(n) <= 0))
	     n = n-1;
 	  end
	  clean_loc_peak(i) = clean_energy(n+1);
       end

       % find the peaks in the processed speech signal

       if (processed_slope(i)>0) 	% search to the right
	  n = i;
          while ((n<num_crit) & (processed_slope(n) > 0))
	     n = n+1;
	  end
	  processed_loc_peak(i) = processed_energy(n-1);
       else				% search to the left
          n = i;
	  while ((n>0) & (processed_slope(n) <= 0))
	     n = n-1;
 	  end
	  processed_loc_peak(i) = processed_energy(n+1);
       end

   end

   % ----------------------------------------------------------
   %  (6) Compute the WSS Measure for this frame.  This 
   %      includes determination of the weighting function.
   % ----------------------------------------------------------

   dBMax_clean       = max(clean_energy);
   dBMax_processed   = max(processed_energy);

   % The weights are calculated by averaging individual
   % weighting factors from the clean and processed frame.
   % These weights W_clean and W_processed should range
   % from 0 to 1 and place more emphasis on spectral 
   % peaks and less emphasis on slope differences in spectral
   % valleys.  This procedure is described on page 1280 of
   % Klatt's 1982 ICASSP paper.

   Wmax_clean        = Kmax ./ (Kmax + dBMax_clean - ...
		 	    clean_energy(1:num_crit-1));
   Wlocmax_clean     = Klocmax ./ ( Klocmax + clean_loc_peak - ...
				clean_energy(1:num_crit-1));
   W_clean           = Wmax_clean .* Wlocmax_clean;

   Wmax_processed    = Kmax ./ (Kmax + dBMax_processed - ...
			        processed_energy(1:num_crit-1));
   Wlocmax_processed = Klocmax ./ ( Klocmax + processed_loc_peak - ...
			            processed_energy(1:num_crit-1));
   W_processed       = Wmax_processed .* Wlocmax_processed;
  
   W = (W_clean + W_processed)./2.0;
  
   distortion(frame_count) = sum(W.*(clean_slope(1:num_crit-1) - ...
		       processed_slope(1:num_crit-1)).^2);

   % this normalization is not part of Klatt's paper, but helps
   % to normalize the measure.  Here we scale the measure by the
   % sum of the weights.

   distortion(frame_count) = distortion(frame_count)/sum(W);
   
   start = start + skiprate;
     
end

復合測度

通過對客觀評價指標線性組合后得到的復合測度,包括signal distortion(SIG),noise distortion(BAK),overall quality (OVRL),計算方式如下:
C s i g = 3.093 ? 1.029 ? LLR + 0.603 ? PESQ ? 0.009 ? WSS {{C}_{sig}}=3.093-1.029\cdot \text{LLR}+0.603\cdot \text{PESQ}-0.009\cdot \text{WSS} Csig?=3.093?1.029?LLR+0.603?PESQ?0.009?WSS
C b a k = 1.634 + 0.478 ? PESQ-0.007 ? WSS+0.063 ? segSNR {{C}_{bak}}=1.634+0.478\cdot \text{PESQ-0}\text{.007}\cdot \text{WSS+0}\text{.063}\cdot \text{segSNR} Cbak?=1.634+0.478?PESQ-0.007?WSS+0.063?segSNR
C o v l = 1.594 + 0.805 ? PESQ ? 0.512 ? LLR ? 0.007 ? WSS {{C}_{ovl}}=1.594+0.805\cdot \text{PESQ}-0.512\cdot \text{LLR}-0.007\cdot \text{WSS} Covl?=1.594+0.805?PESQ?0.512?LLR?0.007?WSS

代碼如下:

function [Csig,Cbak,Covl]= composite(cleanFile, enhancedFile);
% ----------------------------------------------------------------------
%          Composite Objective Speech Quality Measure
%
%   This function implements the composite objective measure proposed in
%   [1]. 
%
%   Usage:  [sig,bak,ovl]=composite(cleanFile.wav, enhancedFile.wav)
%           
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         sig           - predicted rating [1-5] of speech distortion
%         bak           - predicted rating [1-5] of noise distortion
%         ovl           - predicted rating [1-5] of overall quality
%
%       In addition to the above ratings (sig, bak, & ovl) it returns
%       the individual values of the LLR, SNRseg, WSS and PESQ measures.
%
%  Example call:  [sig,bak,ovl] =composite('sp04.wav','enhanced.wav')
%
%  
%  References:
%
%     [1]   Hu, Y. and Loizou, P. (2006). Evaluation of objective measures 
%           for speech enhancement. Proc. Interspeech, Pittsburg, PA. 
%        
%   Authors: Yi Hu and Philipos C. Loizou
%   (the LLR, SNRseg and WSS measures were based on Bryan Pellom and John
%     Hansen's implementations)
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $

% ----------------------------------------------------------------------

if nargin~=2
    fprintf('USAGE: [sig,bak,ovl]=composite(cleanFile.wav, enhancedFile.wav)\n');
    fprintf('For more help, type: help composite\n\n');
    return;
end

alpha= 0.95;

[data1, fs1]= audioread(cleanFile);
[data2, fs2]= audioread(enhancedFile);
if ( fs1~= fs2)
    error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;


% -- compute the WSS measure ---
%
wss_dist_vec= wss( data1, data2,fs1);
wss_dist_vec= sort( wss_dist_vec);
wss_dist= mean( wss_dist_vec( 1: round( length( wss_dist_vec)*alpha)));

% --- compute the LLR measure ---------
%
LLR_dist= llr( data1, data2,fs1);
LLRs= sort(LLR_dist);
LLR_len= round( length(LLR_dist)* alpha);
llr_mean= mean( LLRs( 1: LLR_len));

% --- compute the SNRseg ----------------
%
[snr_dist, segsnr_dist]= snr( data1, data2,fs1);
snr_mean= snr_dist;
segSNR= mean( segsnr_dist);


% -- compute the pesq ----
%
% if     Srate1==8000,    mode='nb';
% elseif Srate1 == 16000, mode='wb';
% else,
%      error ('Sampling freq in PESQ needs to be 8 kHz or 16 kHz');
% end

     
 [pesq_mos_scores]= pesq(cleanFile, enhancedFile);
 
 if length(pesq_mos_scores)==2
     pesq_mos=pesq_mos_scores(1); % take the raw PESQ value instead of the
                                  % MOS-mapped value (this composite
                                  % measure was only validated with the raw
                                  % PESQ value)
 else
     pesq_mos=pesq_mos_scores;
 end
 
% --- now compute the composite measures ------------------
%
Csig = 3.093 - 1.029*llr_mean + 0.603*pesq_mos-0.009*wss_dist;
  Csig = max(1,Csig);  Csig=min(5, Csig); % limit values to [1, 5]
Cbak = 1.634 + 0.478 *pesq_mos - 0.007*wss_dist + 0.063*segSNR;
  Cbak = max(1, Cbak); Cbak=min(5,Cbak); % limit values to [1, 5]
Covl = 1.594 + 0.805*pesq_mos - 0.512*llr_mean - 0.007*wss_dist;
  Covl = max(1, Covl); Covl=min(5, Covl); % limit values to [1, 5]

fprintf('\n LLR=%f   SNRseg=%f   WSS=%f   PESQ=%f\n',llr_mean,segSNR,wss_dist,pesq_mos);

return; %=================================================================


function distortion = wss(clean_speech, processed_speech,sample_rate)


% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
  disp('Error: Files  musthave same length.');
  return
end



% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
max_freq    = sample_rate/2;	   % maximum bandwidth
num_crit    = 25;		   % number of critical bands

USE_FFT_SPECTRUM = 1;		   % defaults to 10th order LP spectrum
n_fft       = 2^nextpow2(2*winlength);
n_fftby2    = n_fft/2;		   % FFT size/2
Kmax        = 20;		   % value suggested by Klatt, pg 1280
Klocmax     = 1;		   % value suggested by Klatt, pg 1280		

% ----------------------------------------------------------------------
% Critical Band Filter Definitions (Center Frequency and Bandwidths in Hz)
% ----------------------------------------------------------------------

cent_freq(1)  = 50.0000;   bandwidth(1)  = 70.0000;
cent_freq(2)  = 120.000;   bandwidth(2)  = 70.0000;
cent_freq(3)  = 190.000;   bandwidth(3)  = 70.0000;
cent_freq(4)  = 260.000;   bandwidth(4)  = 70.0000;
cent_freq(5)  = 330.000;   bandwidth(5)  = 70.0000;
cent_freq(6)  = 400.000;   bandwidth(6)  = 70.0000;
cent_freq(7)  = 470.000;   bandwidth(7)  = 70.0000;
cent_freq(8)  = 540.000;   bandwidth(8)  = 77.3724;
cent_freq(9)  = 617.372;   bandwidth(9)  = 86.0056;
cent_freq(10) = 703.378;   bandwidth(10) = 95.3398;
cent_freq(11) = 798.717;   bandwidth(11) = 105.411;
cent_freq(12) = 904.128;   bandwidth(12) = 116.256;
cent_freq(13) = 1020.38;   bandwidth(13) = 127.914;
cent_freq(14) = 1148.30;   bandwidth(14) = 140.423;
cent_freq(15) = 1288.72;   bandwidth(15) = 153.823;
cent_freq(16) = 1442.54;   bandwidth(16) = 168.154;
cent_freq(17) = 1610.70;   bandwidth(17) = 183.457;
cent_freq(18) = 1794.16;   bandwidth(18) = 199.776;
cent_freq(19) = 1993.93;   bandwidth(19) = 217.153;
cent_freq(20) = 2211.08;   bandwidth(20) = 235.631;
cent_freq(21) = 2446.71;   bandwidth(21) = 255.255;
cent_freq(22) = 2701.97;   bandwidth(22) = 276.072;
cent_freq(23) = 2978.04;   bandwidth(23) = 298.126;
cent_freq(24) = 3276.17;   bandwidth(24) = 321.465;
cent_freq(25) = 3597.63;   bandwidth(25) = 346.136;

bw_min      = bandwidth (1);	   % minimum critical bandwidth

% ----------------------------------------------------------------------
% Set up the critical band filters.  Note here that Gaussianly shaped
% filters are used.  Also, the sum of the filter weights are equivalent
% for each critical band filter.  Filter less than -30 dB and set to
% zero.
% ----------------------------------------------------------------------

min_factor = exp (-30.0 / (2.0 * 2.303));       % -30 dB point of filter

for i = 1:num_crit
  f0 = (cent_freq (i) / max_freq) * (n_fftby2);
  all_f0(i) = floor(f0);
  bw = (bandwidth (i) / max_freq) * (n_fftby2);
  norm_factor = log(bw_min) - log(bandwidth(i));
  j = 0:1:n_fftby2-1;
  crit_filter(i,:) = exp (-11 *(((j - floor(f0)) ./bw).^2) + norm_factor);
  crit_filter(i,:) = crit_filter(i,:).*(crit_filter(i,:) > min_factor);
end   

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Weighted Spectral
% Slope Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

   % ----------------------------------------------------------
   % (1) Get the Frames for the test and reference speech. 
   %     Multiply by Hanning Window.
   % ----------------------------------------------------------

   clean_frame = clean_speech(start:start+winlength-1);
   processed_frame = processed_speech(start:start+winlength-1);
   clean_frame = clean_frame.*window;
   processed_frame = processed_frame.*window;

   % ----------------------------------------------------------
   % (2) Compute the Power Spectrum of Clean and Processed
   % ----------------------------------------------------------

    if (USE_FFT_SPECTRUM)
       clean_spec     = (abs(fft(clean_frame,n_fft)).^2);
       processed_spec = (abs(fft(processed_frame,n_fft)).^2);
    else
       a_vec = zeros(1,n_fft);
       a_vec(1:11) = lpc(clean_frame,10);
       clean_spec     = 1.0/(abs(fft(a_vec,n_fft)).^2)';

       a_vec = zeros(1,n_fft);
       a_vec(1:11) = lpc(processed_frame,10);
       processed_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)';
    end

   % ----------------------------------------------------------
   % (3) Compute Filterbank Output Energies (in dB scale)
   % ----------------------------------------------------------
 
   for i = 1:num_crit
      clean_energy(i) = sum(clean_spec(1:n_fftby2) ...
		            .*crit_filter(i,:)');
      processed_energy(i) = sum(processed_spec(1:n_fftby2) ...
			        .*crit_filter(i,:)');
   end
   clean_energy = 10*log10(max(clean_energy,1E-10));
   processed_energy = 10*log10(max(processed_energy,1E-10));

   % ----------------------------------------------------------
   % (4) Compute Spectral Slope (dB[i+1]-dB[i]) 
   % ----------------------------------------------------------

   clean_slope     = clean_energy(2:num_crit) - ...
		     clean_energy(1:num_crit-1);
   processed_slope = processed_energy(2:num_crit) - ...
		     processed_energy(1:num_crit-1);

   % ----------------------------------------------------------
   % (5) Find the nearest peak locations in the spectra to 
   %     each critical band.  If the slope is negative, we 
   %     search to the left.  If positive, we search to the 
   %     right.
   % ----------------------------------------------------------

   for i = 1:num_crit-1

       % find the peaks in the clean speech signal
	
       if (clean_slope(i)>0) 		% search to the right
	  n = i;
          while ((n<num_crit) & (clean_slope(n) > 0))
	     n = n+1;
 	  end
	  clean_loc_peak(i) = clean_energy(n-1);
       else				% search to the left
          n = i;
	  while ((n>0) & (clean_slope(n) <= 0))
	     n = n-1;
 	  end
	  clean_loc_peak(i) = clean_energy(n+1);
       end

       % find the peaks in the processed speech signal

       if (processed_slope(i)>0) 	% search to the right
	  n = i;
          while ((n<num_crit) & (processed_slope(n) > 0))
	     n = n+1;
	  end
	  processed_loc_peak(i) = processed_energy(n-1);
       else				% search to the left
          n = i;
	  while ((n>0) & (processed_slope(n) <= 0))
	     n = n-1;
 	  end
	  processed_loc_peak(i) = processed_energy(n+1);
       end

   end

   % ----------------------------------------------------------
   %  (6) Compute the WSS Measure for this frame.  This 
   %      includes determination of the weighting function.
   % ----------------------------------------------------------

   dBMax_clean       = max(clean_energy);
   dBMax_processed   = max(processed_energy);

   % The weights are calculated by averaging individual
   % weighting factors from the clean and processed frame.
   % These weights W_clean and W_processed should range
   % from 0 to 1 and place more emphasis on spectral 
   % peaks and less emphasis on slope differences in spectral
   % valleys.  This procedure is described on page 1280 of
   % Klatt's 1982 ICASSP paper.

   Wmax_clean        = Kmax ./ (Kmax + dBMax_clean - ...
		 	    clean_energy(1:num_crit-1));
   Wlocmax_clean     = Klocmax ./ ( Klocmax + clean_loc_peak - ...
				clean_energy(1:num_crit-1));
   W_clean           = Wmax_clean .* Wlocmax_clean;

   Wmax_processed    = Kmax ./ (Kmax + dBMax_processed - ...
			        processed_energy(1:num_crit-1));
   Wlocmax_processed = Klocmax ./ ( Klocmax + processed_loc_peak - ...
			            processed_energy(1:num_crit-1));
   W_processed       = Wmax_processed .* Wlocmax_processed;
  
   W = (W_clean + W_processed)./2.0;
  
   distortion(frame_count) = sum(W.*(clean_slope(1:num_crit-1) - ...
		       processed_slope(1:num_crit-1)).^2);

   % this normalization is not part of Klatt's paper, but helps
   % to normalize the measure.  Here we scale the measure by the
   % sum of the weights.

   distortion(frame_count) = distortion(frame_count)/sum(W);
   
   start = start + skiprate;
     
end

%-----------------------------------------------
function distortion = llr(clean_speech, processed_speech,sample_rate)


% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
  disp('Error: Both Speech Files must be same length.');
  return
end

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %  window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
if sample_rate<10000
   P           = 10;		   % LPC Analysis Order
else
    P=16;     % this could vary depending on sampling frequency.
end

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Log Likelihood Ratio 
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

   % ----------------------------------------------------------
   % (1) Get the Frames for the test and reference speech. 
   %     Multiply by Hanning Window.
   % ----------------------------------------------------------

   clean_frame = clean_speech(start:start+winlength-1);
   processed_frame = processed_speech(start:start+winlength-1);
   clean_frame = clean_frame.*window;
   processed_frame = processed_frame.*window;

   % ----------------------------------------------------------
   % (2) Get the autocorrelation lags and LPC parameters used
   %     to compute the LLR measure.
   % ----------------------------------------------------------

   [R_clean, Ref_clean, A_clean] = ...
      lpcoeff(clean_frame, P);
   [R_processed, Ref_processed, A_processed] = ...
      lpcoeff(processed_frame, P);

   % ----------------------------------------------------------
   % (3) Compute the LLR measure
   % ----------------------------------------------------------

   numerator   = A_processed*toeplitz(R_clean)*A_processed';
   denominator = A_clean*toeplitz(R_clean)*A_clean';
   distortion(frame_count) = log(numerator/denominator); 
   start = start + skiprate;

end

%---------------------------------------------
function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

   % ----------------------------------------------------------
   % (1) Compute Autocorrelation Lags
   % ----------------------------------------------------------

   winlength = max(size(speech_frame));
   for k=1:model_order+1
      R(k) = sum(speech_frame(1:winlength-k+1) ...
		     .*speech_frame(k:winlength));
   end

   % ----------------------------------------------------------
   % (2) Levinson-Durbin
   % ----------------------------------------------------------

   a = ones(1,model_order);
   E(1)=R(1);
   for i=1:model_order
      a_past(1:i-1) = a(1:i-1);
      sum_term = sum(a_past(1:i-1).*R(i:-1:2));
      rcoeff(i)=(R(i+1) - sum_term) / E(i);
      a(i)=rcoeff(i);
      a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
      E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
   end

   acorr    = R;
   refcoeff = rcoeff;
   lpparams = [1 -a];

   
   % ----------------------------------------------------------------------

function [overall_snr, segmental_snr] = snr(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.
% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
  disp('Error: Both Speech Files must be same length.');
  return
end

% ----------------------------------------------------------------------
% Scale both clean speech and processed speech to have same dynamic
% range.  Also remove DC component from each signal
% ----------------------------------------------------------------------

%clean_speech     = clean_speech     - mean(clean_speech);
%processed_speech = processed_speech - mean(processed_speech);

%processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));

overall_snr = 10* log10( sum(clean_speech.^2)/sum((clean_speech-processed_speech).^2));

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
skiprate    = floor(winlength/4);		   % window skip in samples
MIN_SNR     = -10;		   % minimum SNR in dB
MAX_SNR     =  35;		   % maximum SNR in dB

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Segmental SNR
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;					% starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1: num_frames

   % ----------------------------------------------------------
   % (1) Get the Frames for the test and reference speech. 
   %     Multiply by Hanning Window.
   % ----------------------------------------------------------

   clean_frame = clean_speech(start:start+winlength-1);
   processed_frame = processed_speech(start:start+winlength-1);
   clean_frame = clean_frame.*window;
   processed_frame = processed_frame.*window;

   % ----------------------------------------------------------
   % (2) Compute the Segmental SNR
   % ----------------------------------------------------------

   signal_energy = sum(clean_frame.^2);
   noise_energy  = sum((clean_frame-processed_frame).^2);
   segmental_snr(frame_count) = 10*log10(signal_energy/(noise_energy+eps)+eps);
   segmental_snr(frame_count) = max(segmental_snr(frame_count),MIN_SNR);
   segmental_snr(frame_count) = min(segmental_snr(frame_count),MAX_SNR);

   start = start + skiprate;

end

參考文獻

[1] Quackenbush S R . Objective measures of speech quality. Georgia Institute of Technology, 1995.
[2] B.-H. Juang, “On Using the Itakura-Saito Measures for Speech Coder Performance Evaluation”, AT&T BellLaboratories Technical Journal, Vol. 63, No. 8 October 1984, pp. 1477-1498.
[3] N. Kitawaki, H. Nagabuchi and K. Itoh, “Objective quality evaluation for low-bit-rate speech coding systems,” in IEEE Journal on Selected Areas in Communications, vol. 6, no. 2, pp. 242-248, Feb. 1988, doi: 10.1109/49.601.
[4] D. Klatt, “Prediction of perceived phonetic distance from critical-band spectra: A first step,” ICASSP '82. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1982, pp. 1278-1281, doi: 10.1109/ICASSP.1982.1171512.
[5] Hu Y, Loizou P C. Evaluation of objective measures for speech enhancement[C]//Ninth international conference on spoken language processing. 2006.
[6] Loizou P C. Speech enhancement: theory and practice[M]. CRC press, 2007.

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/389003.html

標籤:其他

上一篇:手把手教你10分鐘快速搭建webrtc

下一篇:java 獲取視頻時長

標籤雲
其他(157675) Python(38076) JavaScript(25376) Java(17977) C(15215) 區塊鏈(8255) C#(7972) AI(7469) 爪哇(7425) MySQL(7132) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5869) 数组(5741) R(5409) Linux(5327) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4554) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2429) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) 功能(1967) .NET技术(1958) Web開發(1951) python-3.x(1918) HtmlCss(1915) 弹簧靴(1913) C++(1909) xml(1889) PostgreSQL(1872) .NETCore(1853) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽
  • 網閘典型架構簡述

    網閘架構一般分為兩種:三主機的三系統架構網閘和雙主機的2+1架構網閘。 三主機架構分別為內端機、外端機和仲裁機。三機無論從軟體和硬體上均各自獨立。首先從硬體上來看,三機都用各自獨立的主板、記憶體及存盤設備。從軟體上來看,三機有各自獨立的作業系統。這樣能達到完全的三機獨立。對于“2+1”系統,“2”分為 ......

    uj5u.com 2020-09-10 02:00:44 more
  • 如何從xshell上傳檔案到centos linux虛擬機里

    如何從xshell上傳檔案到centos linux虛擬機里及:虛擬機CentOs下執行 yum -y install lrzsz命令,出現錯誤:鏡像無法找到軟體包 前言 一、安裝lrzsz步驟 二、上傳檔案 三、遇到的問題及解決方案 總結 前言 提示:其實很簡單,往虛擬機上安裝一個上傳檔案的工具 ......

    uj5u.com 2020-09-10 02:00:47 more
  • 一、SQLMAP入門

    一、SQLMAP入門 1、判斷是否存在注入 sqlmap.py -u 網址/id=1 id=1不可缺少。當注入點后面的引數大于兩個時。需要加雙引號, sqlmap.py -u "網址/id=1&uid=1" 2、判斷文本中的請求是否存在注入 從文本中加載http請求,SQLMAP可以從一個文本檔案中 ......

    uj5u.com 2020-09-10 02:00:50 more
  • Metasploit 簡單使用教程

    metasploit 簡單使用教程 浩先生, 2020-08-28 16:18:25 分類專欄: kail 網路安全 linux 文章標簽: linux資訊安全 編輯 著作權 metasploit 使用教程 前言 一、Metasploit是什么? 二、準備作業 三、具體步驟 前言 Msfconsole ......

    uj5u.com 2020-09-10 02:00:53 more
  • 游戲逆向之驅動層與用戶層通訊

    驅動層代碼: #pragma once #include <ntifs.h> #define add_code CTL_CODE(FILE_DEVICE_UNKNOWN,0x800,METHOD_BUFFERED,FILE_ANY_ACCESS) /* 更多游戲逆向視頻www.yxfzedu.com ......

    uj5u.com 2020-09-10 02:00:56 more
  • 北斗電力時鐘(北斗授時服務器)讓網路資料更精準

    北斗電力時鐘(北斗授時服務器)讓網路資料更精準 北斗電力時鐘(北斗授時服務器)讓網路資料更精準 京準電子科技官微——ahjzsz 近幾年,資訊技術的得了快速發展,互聯網在逐漸普及,其在人們生活和生產中都得到了廣泛應用,并且取得了不錯的應用效果。計算機網路資訊在電力系統中的應用,一方面使電力系統的運行 ......

    uj5u.com 2020-09-10 02:01:03 more
  • 【CTF】CTFHub 技能樹 彩蛋 writeup

    ?碎碎念 CTFHub:https://www.ctfhub.com/ 筆者入門CTF時時剛開始刷的是bugku的舊平臺,后來才有了CTFHub。 感覺不論是網頁UI設計,還是題目質量,賽事跟蹤,工具軟體都做得很不錯。 而且因為獨到的金幣制度的確讓人有一種想去刷題賺金幣的感覺。 個人還是非常喜歡這個 ......

    uj5u.com 2020-09-10 02:04:05 more
  • 02windows基礎操作

    我學到了一下幾點 Windows系統目錄結構與滲透的作用 常見Windows的服務詳解 Windows埠詳解 常用的Windows注冊表詳解 hacker DOS命令詳解(net user / type /md /rd/ dir /cd /net use copy、批處理 等) 利用dos命令制作 ......

    uj5u.com 2020-09-10 02:04:18 more
  • 03.Linux基礎操作

    我學到了以下幾點 01Linux系統介紹02系統安裝,密碼啊破解03Linux常用命令04LAMP 01LINUX windows: win03 8 12 16 19 配置不繁瑣 Linux:redhat,centos(紅帽社區版),Ubuntu server,suse unix:金融機構,證券,銀 ......

    uj5u.com 2020-09-10 02:04:30 more
  • 05HTML

    01HTML介紹 02頭部標簽講解03基礎標簽講解04表單標簽講解 HTML前段語言 js1.了解代碼2.根據代碼 懂得挖掘漏洞 (POST注入/XSS漏洞上傳)3.黑帽seo 白帽seo 客戶網站被黑帽植入劫持代碼如何處理4.熟悉html表單 <html><head><title>TDK標題,描述 ......

    uj5u.com 2020-09-10 02:04:36 more
最新发布
  • 2023年最新微信小程式抓包教程

    01 開門見山 隔一個月發一篇文章,不過分。 首先回顧一下《微信系結手機號資料庫被脫庫事件》,我也是第一時間得知了這個訊息,然后跟蹤了整件事情的經過。下面是這起事件的相關截圖以及近日流出的一萬條資料樣本: 個人認為這件事也沒什么,還不如關注一下之前45億快遞資料查詢渠道疑似在近日復活的訊息。 訊息是 ......

    uj5u.com 2023-04-20 08:48:24 more
  • web3 產品介紹:metamask 錢包 使用最多的瀏覽器插件錢包

    Metamask錢包是一種基于區塊鏈技術的數字貨幣錢包,它允許用戶在安全、便捷的環境下管理自己的加密資產。Metamask錢包是以太坊生態系統中最流行的錢包之一,它具有易于使用、安全性高和功能強大等優點。 本文將詳細介紹Metamask錢包的功能和使用方法。 一、 Metamask錢包的功能 數字資 ......

    uj5u.com 2023-04-20 08:47:46 more
  • vulnhub_Earth

    前言 靶機地址->>>vulnhub_Earth 攻擊機ip:192.168.20.121 靶機ip:192.168.20.122 參考文章 https://www.cnblogs.com/Jing-X/archive/2022/04/03/16097695.html https://www.cnb ......

    uj5u.com 2023-04-20 07:46:20 more
  • 從4k到42k,軟體測驗工程師的漲薪史,給我看哭了

    清明節一過,盲猜大家已經無心上班,在數著日子準備過五一,但一想到銀行卡里的余額……瞬間心情就不美麗了。最近,2023年高校畢業生就業調查顯示,本科畢業月平均起薪為5825元。調查一出,便有很多同學表示自己又被平均了。看著這一資料,不免讓人想到前不久中國青年報的一項調查:近六成大學生認為畢業10年內會 ......

    uj5u.com 2023-04-20 07:44:00 more
  • 最新版本 Stable Diffusion 開源 AI 繪畫工具之中文自動提詞篇

    🎈 標簽生成器 由于輸入正向提示詞 prompt 和反向提示詞 negative prompt 都是使用英文,所以對學習母語的我們非常不友好 使用網址:https://tinygeeker.github.io/p/ai-prompt-generator 這個網址是為了讓大家在使用 AI 繪畫的時候 ......

    uj5u.com 2023-04-20 07:43:36 more
  • 漫談前端自動化測驗演進之路及測驗工具分析

    隨著前端技術的不斷發展和應用程式的日益復雜,前端自動化測驗也在不斷演進。隨著 Web 應用程式變得越來越復雜,自動化測驗的需求也越來越高。如今,自動化測驗已經成為 Web 應用程式開發程序中不可或缺的一部分,它們可以幫助開發人員更快地發現和修復錯誤,提高應用程式的性能和可靠性。 ......

    uj5u.com 2023-04-20 07:43:16 more
  • CANN開發實踐:4個DVPP記憶體問題的典型案例解讀

    摘要:由于DVPP媒體資料處理功能對存放輸入、輸出資料的記憶體有更高的要求(例如,記憶體首地址128位元組對齊),因此需呼叫專用的記憶體申請介面,那么本期就分享幾個關于DVPP記憶體問題的典型案例,并給出原因分析及解決方法。 本文分享自華為云社區《FAQ_DVPP記憶體問題案例》,作者:昇騰CANN。 DVPP ......

    uj5u.com 2023-04-20 07:43:03 more
  • msf學習

    msf學習 以kali自帶的msf為例 一、msf核心模塊與功能 msf模塊都放在/usr/share/metasploit-framework/modules目錄下 1、auxiliary 輔助模塊,輔助滲透(埠掃描、登錄密碼爆破、漏洞驗證等) 2、encoders 編碼器模塊,主要包含各種編碼 ......

    uj5u.com 2023-04-20 07:42:59 more
  • Halcon軟體安裝與界面簡介

    1. 下載Halcon17版本到到本地 2. 雙擊安裝包后 3. 步驟如下 1.2 Halcon軟體安裝 界面分為四大塊 1. Halcon的五個助手 1) 影像采集助手:與相機連接,設定相機引數,采集影像 2) 標定助手:九點標定或是其它的標定,生成標定檔案及內參外參,可以將像素單位轉換為長度單位 ......

    uj5u.com 2023-04-20 07:42:17 more
  • 在MacOS下使用Unity3D開發游戲

    第一次發博客,先發一下我的游戲開發環境吧。 去年2月份買了一臺MacBookPro2021 M1pro(以下簡稱mbp),這一年來一直在用mbp開發游戲。我大致分享一下我的開發工具以及使用體驗。 1、Unity 官網鏈接: https://unity.cn/releases 我一般使用的Apple ......

    uj5u.com 2023-04-20 07:40:19 more