We are interested in novel front-end processing with possible benefits to robust speech recognition. Marios Athineos developed the frequency-domain dual of conventional linear prediction (i.e. frequency-domain linear prediction or FDLP) which can capture subband temporal envelope features (peaks) in a compact, parametric form. Keansub Lee has developed noise-robust pitch trackers that can be used to enhance speech by selectively filtering the harmonics related to the voice pitch.
Prosodic features such as pitch, energy, and timing, are typically under-used in current speech recognition systems. We are looking into statistical models for these features to see if they can be better exploited e.g. for emphasized word detection. We are also attempting some perceptual experiments to quantify listener's senstivity to the information in pitch tracks, and how it depends on syllables. You can try our experiment.
Speech is one particularly important source we encounter in the personal audio domain; we have looked at ways to detect speech in high noise, and are currently looking at speaker turn segmentation and identification (i.e. diarization).
See also the homepage for the NSF-supported Mapping Meetings project.