Le projet Structuration, Analyse, MOdélisation de la Video et de l'Audio à l'IRIT - Association Francophone de la Communication Parlée

Le projet Structuration, Analyse, MOdélisation de la Video et de l’Audio à l’IRIT

Package description : This package contains a set of 6 multilingual phonetic decoders (English, German, Hindi, Japanese, Mandarin and Spanish). Each decoder was trained on the Oregon Graduate Institute-Multi Language Telephone Speech Corpus.

The models are based on Hidden Model Markov. 10 Gaussians were used for each state. 12 PLP, the energy and their derivative were used for parametrerization. The frequency bank is in the range of the telephone speech: 300-3400 Hz. The overall topology of the models consists in 3 states HMM, with some adjustments considering the average acoustical duration of the phonetic class considered.

The labeling of each phoneme is based on the OGI labeling guide.

A script is provided in the package in order to facilitate the decoders handling. You will need the HTK toolkit installed in order to use it.

Informations et chargement : http://www.irit.fr/recherches/SAMOVA