Dan
Ellis : Resources
: Matlab
:
Sinewave Speech Analysis/Synthesis in Matlab
Introduction
Sinewave Speech is a curious phenomenon where a small number of sinusoids
added together take on some of the characteristics of speech - which in
most respects they do not resemble at all. Using three sinusoids
that track the frequency and amplitude of the first three speech formants,
high intelligibility can be achieved. This phenomenon has been extensively
investigated by Robert Remez, Philip Rubin and others. There is a
much more detailed
description at the web site of Haskin's
Lab in New Haven CT, where much of the work was done.
The Haskins site includes several example
analysis files that you can download. These files contain, in
a compact form, all the data you need to resynthesize the sinewave speech.
The Matlab routines below do this for you.
Sinewave analysis
I was developing some examples of LPC analysis for my
speech and audio class, and to my surprise, crude translation of
LPC pole positions does a pretty good job of extracting sinewave speech
parameters. Thus, I am pleased to offer the following routines:
- Main routine:
[F,M] = swsmodel(D,R,H) returns four
sinusoids, with frequencies defined by rows of F and magnitudes defined
by rows of M, tracking the formants in the speech sample D (of sampling
rate R). Each column of F and M corresponds to H samples (so the analysis
frame rate is R/H). Note: the sound is resampled to 8 kHz within the
routine to focus the LPC on the main formant region, below 4 kHz.
- Support routine:
- [A,G,E] = lpcfit(D,P,H,W,O) fits P-th order
LPC (all-pole, autoregressive) models to sound waveform D, using
W-point windows advanced by H samples. Rows of A contain
all-pole filter coefficiets [1 a1 a2 .. aP], with corresponding
elements of G giving the frame gain (residual RMS). E is the actual
excitation residual. Specifying OV as zero prevents overlap-add of the
residual, for perfect reconstruction but a less useful E.
- Support routine:
[F,M] = lpca2frq(A) factorizes the
LPC polynomial defined in each row of A (as from lpcfit.m) and
returns the sorted positive frequencies (up to P/2 of them) in
columns of F, each with a corresponding approximate magnitude in M.
- Bonus routine:
D = lpcsynth(A,G,E,H,OV) resynthesizes
from LPC parameters returned by lpcfit, or using noise excitation if
E is omitted.
An example use is shown below:
>> [d,r] = wavread('mpgr1_sx419.wav');
>> [F,M] = swsmodel(d,r);
>> plot(F'); % show all the frequencies
>> dr = synthtrax(F,M,r);
>> % Listen to it
>> sound(dr,r)
>> % Compare to noise-excited reconstruction of LPC analysis
>> [a,g] = lpcfit(d);
>> dl = lpcsynth(a,g);
>> sound(dl,r);
>> % The LPC reconstruction is based on more or less the same information
>> % as the sinewave replica, but it sounds less 'weird'
>> % Compare the spectrograms
>> subplot(311)
>> specgram(d,256,r);
>> title('Original');
>> subplot(312)
>> specgram(dr,256,r);
>> title('Sine wave replica');
>> subplot(313)
>> specgram(dl,256,r);
>> title('Noise-excited LPC reconstruction');
Referencing
If you wish to reference this code in your publications, you can use the following citation:
D. P. W. Ellis (2004)
"Sinewave Speech Analysis/Synthesis in Matlab",
Web resource, available: http://www.ee.columbia.edu/ln/labrosa/matlab/sws/ .
Last updated: $Date: 2016/04/17 23:33:41 $
Dan Ellis <dpwe@ee.columbia.edu>