Objective measures of speech quality/SNR
This collection of Matlab functions calculates a set of objective speech quality measures, mostly focused around some version of SNR (i.e. speech energy to nonspeech energy ratio). The measures are:
NIST STNR - see http://labrosa.ee.columbia.edu/~dpwe/tmp/nist/doc/stnr.txt
WADA SNR - see http://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf
BSS_EVAL - see http://bass-db.gforge.inria.fr/bss_eval/
SNR_VAD - the "extra" energy in regions designated as speech by some kind of voice activity detection (VAD) when compared to the energy of the "gaps" in-between.
In the example below, we'll evaluate the speech quality for data with a range of different information avalable. In the simplest case, all we have is the noisy speech signal. We can still calculate STNR and WADA:
#============= SNREVAL v0.54 (20140701) === # args: No VAD, but guessing is not selected Target File: arabic_400mhz.wav time range: 0.0-60.0 s NIST STNR = 23.8 dB WADA SNR = 19.6 dB ==========================================
The package includes a quick-and-dirty voice activity detection (VAD) algorithm based on the energy in the 100-1000 Hz region. If this is enabled, it can be used as a basis for SNR_VAD. This is controlled by a flag:
#============= SNREVAL v0.54 (20140701) === # args: -guessvad 1 Guessing VAD from noisy file arabic_400mhz.wav ... Target File: arabic_400mhz.wav time range: 0.0-60.0 s NIST STNR = 23.8 dB WADA SNR = 19.6 dB SNRvad = 8.1 dB ==========================================
Alternatively, we can use existing VAD labels read from a text file. The file consists of lines of the format start_sec end_sec label, where label is ignored, and each start_sec end_sec pair designates a voice-active region:
#============= SNREVAL v0.54 (20140701) === # args: -vad arabic_400mhz-vad.txt Target File: arabic_400mhz.wav time range: 0.0-60.0 s NIST STNR = 23.8 dB WADA SNR = 19.6 dB SNRvad = 12.3 dB ==========================================
If a clean reference signal is also avaliable, then PESQ and BSS_EVAL measures can also be calculated:
snreval('arabic_400mhz.wav','-vad','arabic_400mhz-vad.txt','-clean', ... 'arabic_source.wav');
#============= SNREVAL v0.54 (20140701) === # args: -vad arabic_400mhz-vad.txt -clean arabic_source.wav Target File: arabic_400mhz.wav time range: 0.0-60.0 s Ref File: arabic_source.wav Targ delay: -0.237 s NIST STNR = 23.8 dB WADA SNR = 19.6 dB SNRvad = 12.3 dB SAR = 9.1 dB PESQ MOS = 2.6 ==========================================
Sometimes it is useful to be able to specify a time limit for the region to be analyzed in a file, for instance to exclude particularly bad noise regions. The '-start' and '-end' flags specify the start and end times of analysis, in seconds:
snreval('arabic_400mhz.wav','-clean','arabic_source.wav', '-start', ... 8, '-end', 48);
#============= SNREVAL v0.54 (20140701) === # args: -clean arabic_source.wav -start 8 -end 48 No VAD, but guessing is not selected Target File: arabic_400mhz.wav time range: 8.0-48.0 s Ref File: arabic_source.wav Targ delay: -0.235 s NIST STNR = 24.0 dB WADA SNR = 17.4 dB SAR = 13.7 dB PESQ MOS = 3.0 ==========================================
snreval can be run over a whole list of files with the -listin 1 flag, which causes the noise file to be treated as a file containing a list of noisy file names, one per line. VAD and clean files for each one can be provided with -vaddir and -cleandir (if they have the same name stems and are all in one directory), or -vadlist and -cleanlist to provide corresponding list files giving individual names for each VAD and clean file. E.g.,
snreval('noisylist.txt','-listin',1,'-disp',0, ... '-cleanlist','cleanlist.txt','-samplerate',8000,'-end',300); % There's also a -listout 1 flag to report results in a % consistently-shaped, one file-per line output format, for easier % machine processing: snreval('noisylist.txt','-listin',1,'-listout',1, '-disp', 0, ... '-cleanlist','cleanlist.txt','-samplerate',8000,'-end',300); % Values that are not calculated are reported as -999.
#============= SNREVAL v0.54 (20140701) === # args: -listin 1 -disp 0 -cleanlist cleanlist.txt -samplerate 8000 -end 300 Target File: /u/drspeech/data/RATS/data/LDC2011E86_v2/data/train/rats-cts-alv/audio/a/20665_20110720_014200_10486_rats-cts-alv_A.flac time range: 0.0-300.0 s Ref File: /u/drspeech/data/RATS/data/LDC2011E86_v2/data/train/rats-cts-alv/audio/src/20110609_190355_2929_B_10486_rats-cts-alv_src.flac Targ delay: 2.345 s NIST STNR = 13.2 dB WADA SNR = 5.3 dB SAR = -1.3 dB PESQ MOS = 1.9 ==========================================
Error using ==> audioread at 30 audioread: file /u/drspeech/data/RATS/data/LDC2011E31/data/fsalv/audio/v30_v24/20110318_183605_0000_fsalv.v30_v24.flac not found Error in ==> snreval at 161 [dn,sr] = audioread(NOISY,SAMPLERATE,1,TS,DUR); Error in ==> demo_snreval at 78 snreval('noisylist.txt','-listin',1,'-disp',0, ...
snreval loads the entire soundfiles (or specified portions) into memory at once and calculates spectrograms of the whole thing. For signals sampled at or downsampled to 8 kHz, a 300 s excerpt can comfortably be processed in about 1G of core. But you should avoid trying to load files much larger than that unless you want to watch your machine swap memory to disk for a long time.
The full list of flags recognized is given below. The first argument is always the name of the noisy file (or list with -listin), then...
-vad <vadfile> gives the name of the provided voice activity file -clean <cleanfile> gives the name of a corresponding clean-speech file -start <time_sec> -end <time_sec> specify subsegment to process within noisy & clean -guessvad 1 try to guess the VAD from CLEAN (or NOISY if no CLEAN). -disp 0 don't do any graphics. -listin 1 treat NOISY as a text file listing the actual files to process -listout 1 write output values in columns instead of text report -vaddir <dir> directory containing VAD files named like noisy files -vadlist <listfile> file containing list of VAD files instead -cleandir <dir> directory containing clean files named like noisy files -cleanlist <listfile> file containing list of clean files instead -ldclabels 1 treat VAD file as 8-column LDC format (instead of 3-col) -samplerate <SR> resample data to this SR before processing -checkfshift 1 try SSB-style freq shift to match CLEAN to TARG
This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:
|Architecture||Compiled package||MCR Installer|
|64 bit Linux||snreval_GLNXA64.zip||Linux 64 bit MCR Installer|
|64 bit MacOS||snreval_MACI64.zip||MACI64 MCR Installer|
There are more instructions on installing the command-line version in README.txt. The syntax of the command line version is essentially identical to the examples above, but without the parens, quotes, or commas.
The Matlab source can be downloaded in the following ZIP file: snreval.zip
% 2014-07-01 v0.54 * fixed problem with empty lists; improved % finding binaries. % % 2013-10-02 v0.53 * added -preemph for pre-emphasizing, to affect % SNR calcs. % % 2013-10-01 v0.52 * new version of audioread handles a-law wavs % * added -hpf option to high-pass filter remove % LF noise % * added -my_stnr to avoid running nist binary % * "fixed" help message to be the help message % % 2013-08-01 v0.51 * Updated to use latest version of audioread. % % 2012-01-03 v0.5 * Better error messages, version reporting % * Uses new audioread, flacread % * avoids LUT overflow in wada_snr?? % % 2011-10-29 v0.4 * Folded eval_snr into snreval. % * Added batch processing options (-listin/-listout). % * Updated with newest SAR calculation from renoiser. % * Added support for LDC-format S/NS/NT VAD files. % % 2011-08-02 v0.3 include nist_stnr_m to approximate NIST stnr if % binary not available % % 2011-05-24 v0.2 modified guess_vad to ignore pure-zero frames in thresh. %
This work was supported by the DARPA RATS program, team SCENIC. The PESQ calculation uses code by Philip Loizou of UT Dallas.
Last updated: $Date: 2011/08/04 01:35:05 $ Dan Ellis email@example.com