The Speech Test Videos corpus consists of recordings of common speech test materials (see table below). The recordings incorporate multiple talkers and repetitions, with utterance length and context ranging from isolated vowels and syllables to highly-predictable sentences. Brief test utterances that are embedded in the carrier phrase (“You will mark ____ please.” ) have been marked for easy extraction of the audio. The recorded audio and video signals have been carefully synchronized. For most sets of materials, recordings were made with two male and two female talkers; in a few sets it was one male and one female.
Click on the titles in the table below for more information on each group of recordings.
Test Material | Carrier Phrase | Talkers | Repetitions | Number of Items |
---|---|---|---|---|
CV syllables | yes | 2M, 2F | 3 | 1320 (330 per talker) |
VC syllables | yes | 2M, 2F | 3 | 1200 (300 per talker) |
hVd syllables | yes | 2M, 2F | 3 | 180 (45 per talker) |
MRT words | yes | 2M, 2F | 3 | 3276 (819 per talker) |
Numbers 0-10 | yes | 1M, 1F | 3 | 66 (33 per talker) |
Numbers 0-99 | no | 1M, 1F | 3 | 600 (300 per talker) |
High-Probability Spin Sentences | no | 2M, 2F | 1 | 200 (200 per talker) |
Nonsense Sentences | no | 2M, 2F | 1 | 200 (200 per talker) |
Structured Sentences | no | 2M, 2F | 1 | 2000 (500 per talker) |
File Formats
All materials are available as:
(i) videos — with full HD 1920×1080 pixel resolution, stored in .mov files with H.264 video encoding and a single channel of uncompressed 24-bit audio;
and
(ii) audio-only — single-channel, 24-bit .wav files.
In the corpora that use carrier phrases (see list above), the .mov files for the test items spoken in the carrier phrase include chapter markers indicating the start and end time of the item within the phrase. Matlab .mat files are also provided that contain the key-word start and stop times for each file.
All signal levels have been normalized to an RMS of -25 dB re Full Scale; .mat files provide the original RMS levels.