Dan Ellis : Resources: Matlab:

SKEWVIEW - Tool to visualize timing skew between files

skewview is a Matlab script that can be used to visualize the timing skew between two sound files. It breaks both files up into a set of short pieces (by default 4 seconds long) performs a normalized cross-correlation between corresponding pieces, then plots the time of the peak of this correlation as a function of time within the file. If the files contain versions of the same signal, the peak of the correlation will usually indicate the relative timing skew (delay) between the two files. This can be used to check for such a skew/delay.

skewview supports a range of input sound file formats, including WAV and FLAC (the latter provided via an external flac binary).

Contents

Example usage

In the code below, we plot the timing skew between the two excerpted files, 20110221_1452+60.xr_lre.xxx.clean.flac (which is a clean source signal), and 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac (which is the signal recorded after transmission across a radio link). The piecewise-constant time skew between the recordings is clearly shown. The (optional) later arguments in this case set the start and end times (in seconds) for the analysis.

skewview('20110221_1452+60.xr_lre.xxx.clean.flac','20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac','-start',0,'-end',60);
Reading ref  20110221_1452+60.xr_lre.xxx.clean.flac ...
Reading targ 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac ...
Making initial coarse alignment with initialdownsamp=4...
New initial delay = 0.555 sec
Calculating short-time cross-correlation...
Plotting...
+++ SkewView v0.9 for 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac
ref=20110221_1452+60.xr_lre.xxx.clean.flac
times start=0 end=60
win=10 hop=2 maxlag=10 peakth=0.2
Lin fit stats: sd = 0.040035 prop pts = 0.280
Lin fit: t_targ = (1 - 0.021935) t_ref + 0.706
MEDIAN LAG = 0.071 s, STDDEV = 0.266 s,  ABVTHRESH = 0.960

Showing Spectrograms

skewview can also display spectrograms of the two audio signals, aligned to the cross correlations, for further diagnosis.

skewview 20110221_1452+60.xr_lre.xxx.clean.flac 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac -dur 60 -plotsgrams 1 -hop 0.2
Reading ref  20110221_1452+60.xr_lre.xxx.clean.flac ...
Reading targ 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac ...
Making initial coarse alignment with initialdownsamp=4...
New initial delay = 0.555 sec
Calculating short-time cross-correlation...
Plotting...
+++ SkewView v0.9 for 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac
ref=20110221_1452+60.xr_lre.xxx.clean.flac
times start=0 end=60
win=10 hop=0.2 maxlag=10 peakth=0.2
Lin fit stats: sd = 0.039977 prop pts = 0.302
Lin fit: t_targ = (1 - 0.022821) t_ref + 0.737
MEDIAN LAG = 0.071 s, STDDEV = 0.262 s,  ABVTHRESH = 0.940

Writing aligned outputs

skewview will attempt to fit a linear relationship between the reference track timebase and the best timing skew between the two tracks. When two files are related by a simple delay and possibly a small resampling factor (clock drift), this fit will indicate a simple trim-and-resampling operation that can be used to modify the target to be (almost) correctly temporally aligned to the reference. You can get it to write this output with -alignout outfilename. Note that if the clock drift (resampling) is significant (e.g., 0.1% or more), you need to limit the size of the correlation window to avoid significant drift within a single window from blurring the cross-correlation peaks. By the same token, as the drift gets smaller, you can use longer windows and get better alignments. So you may want to run the alignment multiple times to get increasingly good alignments.

Note also that you can manually adjust the green "handles" at each end of the green best-fit line to improve the alignment. Skewview then reports the best fit and the corresponding sox command to generate an aligned output.

% warwick-mix.wav is a full mix, warwick-aca.wav is an acapella
% version that includes about 0.6% clock drift
% First pass
subplot(221)
skewview warwick-mix.wav warwick-aca.wav -win 4 -hop 0.5 -alignout warwick-aca-to-mix.wav
% Second pass - double window length
subplot(222)
skewview warwick-mix.wav warwick-aca-to-mix.wav -win 8 -hop 1 -alignout warwick-aca-to-mix-2.wav
% Third pass
subplot(223)
skewview warwick-mix.wav warwick-aca-to-mix-2.wav -win 8 -hop 1 -alignout warwick-aca-to-mix-3.wav
% By now, the alignment is pretty good
subplot(224)
skewview warwick-mix.wav warwick-aca-to-mix-3.wav -win 8 -hop 1
[dm,sr] = wavread('warwick-mix.wav');
[da,sr] = wavread('warwick-aca-to-mix-3.wav');
ll = min([20*sr, length(dm), length(da)]);
soundsc([dm(1:ll),10*da(1:ll)], sr) [sound]
% good synchronization
Reading ref  warwick-mix.wav ...
Reading targ warwick-aca.wav ...
Making initial coarse alignment with initialdownsamp=4...
New initial delay = -9.919 sec
Calculating short-time cross-correlation...
Plotting...
+++ SkewView v0.9 for warwick-aca.wav
ref=warwick-mix.wav
times start=0 end=0
win=4 hop=0.5 maxlag=4 peakth=0.2
Lin fit stats: sd = 0.009417 prop pts = 0.290
Lin fit: t_targ = (1 - 0.008521) t_ref - 9.693
MEDIAN LAG = -9.977 s, STDDEV = 0.082 s,  ABVTHRESH = 1.000
Reading ref  warwick-mix.wav ...
Reading targ warwick-aca-to-mix.wav ...
Making initial coarse alignment with initialdownsamp=4...
New initial delay = 0.028 sec
Calculating short-time cross-correlation...
Plotting...
+++ SkewView v0.9 for warwick-aca-to-mix.wav
ref=warwick-mix.wav
times start=0 end=0
win=8 hop=1 maxlag=8 peakth=0.2
Lin fit stats: sd = 0.004202 prop pts = 0.340
Lin fit: t_targ = (1 + 0.002261) t_ref - 0.043
MEDIAN LAG = 0.030 s, STDDEV = 1.488 s,  ABVTHRESH = 0.943
Reading ref  warwick-mix.wav ...
Reading targ warwick-aca-to-mix-2.wav ...
Making initial coarse alignment with initialdownsamp=4...
New initial delay = 0.009 sec
Calculating short-time cross-correlation...
Plotting...
+++ SkewView v0.9 for warwick-aca-to-mix-2.wav
ref=warwick-mix.wav
times start=0 end=0
win=8 hop=1 maxlag=8 peakth=0.2
Lin fit stats: sd = 0.000515 prop pts = 0.302
Lin fit: t_targ = (1 + 0.000364) t_ref - 0.017
MEDIAN LAG = -0.006 s, STDDEV = 0.003 s,  ABVTHRESH = 0.849
Reading ref  warwick-mix.wav ...
Reading targ warwick-aca-to-mix-3.wav ...
Making initial coarse alignment with initialdownsamp=4...
New initial delay = 0.002 sec
Calculating short-time cross-correlation...
Plotting...
+++ SkewView v0.9 for warwick-aca-to-mix-3.wav
ref=warwick-mix.wav
times start=0 end=0
win=8 hop=1 maxlag=8 peakth=0.2
Lin fit stats: sd = 0.000068 prop pts = 0.717
Lin fit: t_targ = (1 + 0.000041) t_ref + 0.000
MEDIAN LAG = 0.002 s, STDDEV = 0.000 s,  ABVTHRESH = 0.925

Optional arguments

Behavior is controlled by optional arguments specified as param/value pairs, detailed below. From v0.86 onwards, if the first argument starts with a "-", it is assumed all arguments are in "-parameter value" format, otherwise the first two arguments are taken as reference and target sound file names.

skewview -help
skewview v0.9 of 20140219
-ref	reference audio wavfile ()
-targ	target audio wavfile ()
-start	start at this point in files (0)
-end	end analysis at this point (0)
-dur	limit analysis to this much audio (-1)
-win	xcorr analysis window in sec (10)
-hop	hop between success windows in sec (2)
-maxlag	largest lag to consider (dlft win) (0)
-peakth	threshold of max to count as peak (0.2)
-initialdelay	center around this t_targ (NaN)
-estinitialdelay	estimate targ-ref by global xcorr (1)
-initialdownsamp	downsample by before initial xcorr (4)
-samplerate	resample to this before comparison (0)
-fitthresh	controls inclusion of outliers in lin fit (2)
-alignout	name for time-warped target audio output ()
-pngout	name for PNG-format screen dump ()
-textout	name for text-format <time skew> pairs ()
-corrout	include normalized corr vals in textout (0)
-minspread	minimum spread of Y axis (sec) (0.1)
-plotsgrams	add specgram plots above xcorr (0)
-disp	enable (disable) graphic display (1)

Compiled target usage

Invoking the compiled target is the same as above, except without the punctuation e.g.

./run_skewview_prj.sh 20110221_1452+60.xr_lre.xxx.clean.flac 20110221_1452+60.xr_lre.xxx.tsx300_tsx300.flac -start 0 -end 60

Installation

This package has been compiled for several targets using the Matlab compiler. You will also need to download and install the Matlab Compiler Runtime (MCR) Installer. Please see the table below:

ArchitectureCompiled packageMCR Installer
32 bit Linux skewview_GLNX86.zip Linux MCR Installer
64 bit Linux skewview_GLNXA64.zip Linux 64 bit MCR Installer
64 bit MacOS skewview_MACI64.zip MACI64 MCR Installer

The original Matlab code used to build this compiled target is available at

<http://labrosa.ee.columbia.edu/projects/skewview/>

All sources are in the package skewview-v0.90.zip.

Feel free to contact me with any problems.

Changelog

% 2014-02-19 v0.90  - changed final resampling to work in parts
%                     (reading and writing MP3 input and output in
%                     parts using popen()) to speed up -alignout
%                     writing.  Saves maybe ~15% on 8GB Macbook for
%                     75 minute, 44.1 kHz stereo file (3:30 -> 3:00).
%
% 2014-02-05 v0.89  - added 'corrout' option to make 'textout'
%                     add actual normalized xcorr peak to <time,
%                     skew> pairs.
%
% 2014-01-27 v0.88  - Fixed the rare bug in new_stxcorr that
%                     crashed if final block had only one frame.
%
% 2014-01-23 v0.87  - Changed when resampled alignout file is
%                     actually written: was written whenever slope
%                     was changed, now only written when plot is
%                     closed (or immediately if no plot).
%
% 2014-01-01 v0.86  - now short-time cross-correlation can have
%                     lags much larger than then actual window, and
%                     the correlations are always between
%                     fully-populated windows.
%                   - better memory management during st_xcorr.
%
% 2013-12-19 v0.85  - cleaned up calculation of best skew/offset
%                   - initial delay is estimated before chopping
%                     durs to shorter of pair
%                   - maxlag now defaults to same as win
%                   - fixed callback to rewrite alignout after adjustment
%                   - added documentation for -alignout usage
%
% 2013-07-09 v0.84  - added -plotsgrams option to plot synchronized
%                     spectrograms.  Changes to find_skew algo.
%                     Added -dur as alternative to -end.
%                     Reports "lin fit stats" including SD relative
%                     to best linear fit over selected points
%                     only.
%
% 2013-07-02 v0.83  - resampling/trimming now done internally when
%                     -alignout filename is specified.
%                   - minor changes to audioread to handle ~,
%                     pathless files.
%                   - default is now -estinitialdelay 1
%
% 2013-05-15 v0.82  - fixed bug where maxlag > win caused crash.
%                   - took out check where mac version used slower xcorr.
%
% 2013-05-14 v0.81  - fixed bug where perfect time alignment caused crash
%                   - -initialxcorr renamed -estinitialdelay
%                   - fixed bug that gave incorrect offsets when
%                     new initialdelay was positive (with -estinitialdelay 1)
%
% 2013-05-05 v0.8   - Now STDDEV is reported relative to the best-fit
%                     line, so it can be very small even for tracks
%                     with a significant (but systematic) clock
%                     skew.
%                   - New flag -initialxcorr 1 will estimate a
%                     global time skew for the whole track,
%                     obviating the need for -initialdelay.
%
% 2013-04-08 v0.75  Added -minspread option to force a minimum
%                   y-axis range (rather than having it collapse to
%                   very small range for near-synchronous fits).
%
% 2013-03-07 v0.74  Interactive fixup of best-fit line!  Grab
%                   points at end to adjust the line; reports new
%                   lin fit parameters & sox command on mouse-up.
%
% 2013-01-24 v0.73  -alignout now works for both advance (via trim)
%                   and delay of output.  Shell script now handles
%                   filenames with spaces and special characters.
%                   Linear fitting in linfit.m now also excludes
%                   points with the top and bottom 10% of slopes to
%                   adjacent points (i.e., aiming for
%                   middle-80%-median slope).
%
% 2012-09-26 v0.72  Added -alignout, which causes it to report a
%                   sox command that generates a version of TARG
%                   that aligns to REF.
%
% 2012-09-24 v0.71  Added linear fit to report best offset and skew.
%                   Optimized calculcation of cross-correlation.
%
% 2011-09-09 v0.7 Incorporated new audioread to allow efficient
% access to parts of very large files; added help message in program.
%
% 2011-09-06 v0.6 Modified audio reading code to better handle
% large/high SR files.
%
% 2011-08-03 v0.5 Added version number in text report output.
%
% 2011-07-19 v0.4 Added -textout option to allow raw text file dump
% of local skew times.
%
% 2011-05-03 v0.3 Added -initialdelay to handle files with large
% default skews, and -samplerate to specify an optional lower
% sampling rate at which to perform analysis, to accommodate much
% larger maximum lags and total file durations without exhausting
% memory.
%
% 2011-04-19 v0.2 Added text report of mean and SD of skew, and
% multiple command-line options to control the various internal
% parameters
%
% 2011-03-15 v0.1 Initial release
%

Acknowledgment

This work was supported by DARPA under the RATS program via a subcontract from the SRI-led team SCENIC. My work was on behalf of ICSI.

Last updated: $Date: 2011/08/04 01:34:37 $ Dan Ellis dpwe@ee.columbia.edu