Georges Linarès

Université d'Avignon

Session JEP poster P3 Mardi 10 Juin - 14h00 16h00

papier 1603 Combinaison de différents jeux de paramètres acoustiques pour la reconnaissance de la parole

Loïc Barrault  ( Laboratoire Informatique d'Avignon)

Driss Matrouf  ( Laboratoire Informatique d'Avignon)

Georges Linarès  ( Laboratoire Informatique d'Avignon)

Renato De-Mori  ( Laboratoire Informatique d'Avignon)

Résumé : With the purpose of improving Automatic Speech Recognition (ASR) systems performance, many different approaches on combining them have been largely studied. In this paper, a combination of state a posteriori probabilities given by different feature sets is proposed. In order to perform a coherent combination of state posterior probabilities, the acoustic models trained on different feature sets must have the same topology (i.e. same set of states). For this purpose, a fast and efficient twin model training protocol is proposed. Two different strategies for combining probabilities are presented : the linear and the log linear interpolation. By using log linear interpolation, a relative Word Error Rate (WER) reduction of about 15% and 14% have been observed respectively on MEDIA and ESTER corpora.

article

Session JEP poster P3   Mardi 10 Juin - 14h00 16h00

papier 1622 Combinaison de systèmes par décodage guidé

Benjamin Lecouteux  ( LIA, Avignon)

Georges Linarès  ( LIA, Avignon)

Yannick Estève  ( LIUM, Le Mans)

Guillaume Gravier  ( IRISA, Rennes)

Résumé : In this paper, we propose an integrated approach for system combination named Driven Decoding Algorithm (DDA). It consists in guiding the search algorithm of a primary ASR system by the outputs of an auxiliary system. We first evaluate this method in simple configuration in which the primary search is driven by the one-best hypothesis of a single auxiliary system. Then, we generalize DDA to confusion-network driven decoding and we propose a general combination schemes for multiple system combination. The proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized-DDA outperforms significantly ROVER method: we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.

article

Session JEP poster P3   Mardi 10 Juin - 14h00 16h00

papier 1657 Adaptation rapide de modèles acoustiques compacts

Christophe Lévy  ( Université d'Avignon et des Pays de Vaucluse)

Georges Linarès  ( Université d'Avignon et des Pays de Vaucluse)

Jean-François Bonastre  ( Université d'Avignon et des Pays de Vaucluse)

Résumé : In a previous work we presented a new architecture dedicated to embedded speech recognition. It relies on a general GMM, which represents the whole acoustic space, associated with a set of HMM state-dependent probability functions modeled as transformations of this GMM. This work takes advantage of this architecture to propose a fast and efficient way to adapt the acoustic models. The adaptation is performed only on the general GMM model and does not require state-dependent adaptation data. It is also very efficient in terms of computational cost. We evaluate our approach in the voice-command task. This adaptation method achieved a relative error-rate decrease of about 10% even if few adaptation data are available.

article

Session JEP orale O3   Parole spontanée et interaction   Mercredi 11 Juin - 10h30 12h30

papier 1616 Caractérisation et détection de parole spontanée dans de larges collections de documents audio

Vincent Jousse  ( Laboratoire d'Informatique de l'Université du Maine (LIUM))

Yannick Estève  ( Laboratoire d'Informatique de l'Université du Maine (LIUM))

Frédéric Béchet  ( Laboratoire d'Informatique d'Avignon (LIA))

Thierry Bazillon  ( Laboratoire d'Informatique de l'Université du Maine (LIUM))

Georges Linarès  ( Laboratoire d'Informatique d'Avignon (LIA))

Résumé : Processing spontaneous speech is one of the many challenges that ASR systems have to deal with. The main evidences characterizing spontaneous speech are disfluencies (filled pause, repetition, repair and false start) and many studies have focused on the detection and the correction of these disfluencies. In this study we define spontaneous speech as unprepared speech, in opposition to prepared speech where utterances contain well-formed sentences close to those that can be found in written documents. This paper proposes a set of acoustic and linguistic features that can be used for characterizing and detecting spontaneous speech segments from large audio databases.

article

Session JEP orale O4   Reconnaissance de la parole et du locuteur   Jeudi 12 Juin - 14h00 16h00

papier 1574 Enrichissement dynamique du vocabulaire à partir du Web

Stanislas Oger  ( Université d'Avignon)

Georges Linarès  ( Université d'Avignon)

Frédéric Béchet  ( Université d'Avignon)

Pascal Nocéra  ( Université d'Avignon)

Résumé : Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. We first demonstrate the relevance of the Web for the OOV word retrieval. Then, different methods are proposed to retrieve the hypothesis words. Finally we present the integration of new words in the transcription process based on part-of-speech models. This technique allows to recover 7.6% of the significant OOV words and the accuracy of the system is slightly improved.

article

Georges Linarès

Université d'Avignon

Session JEP poster P3 Mardi 10 Juin - 14h00 16h00

Session JEP poster P3 Mardi 10 Juin - 14h00 16h00

Session JEP poster P3 Mardi 10 Juin - 14h00 16h00

Session JEP orale O3 Parole spontanée et interaction Mercredi 11 Juin - 10h30 12h30

Session JEP orale O4 Reconnaissance de la parole et du locuteur Jeudi 12 Juin - 14h00 16h00