Abstract:
Most automatic speech recognition work has
concentrated on read speech, whose acoustic aspects
differ significantly from speech found in actual
dialogues. A primary difference between read speech
and spontaneous speech concerns a high rate of
disfluencies (e.g., filled pauses, repetitions, repairs, false
starts). Filled pauses (e.g., “uh,” “um”), unlike silences,
resemble phones as part of words in continuous speech.
In this paper the problem of detection of filled pauses in
spontaneous speech and how this can be useful in
automatic speech recognition are considered. The
acoustic aspects of filled pauses in a widely-used
SWITCHBOARD database are examined here, from the
point of view of identifying them acoustically using a
combination of duration, fundamental frequency and
spectra.