Georges Linarès
Université d'Avignon
Session JEP poster P3 Mardi 10 Juin - 14h00 16h00
-
papier 1603
Combinaison de différents jeux de paramètres acoustiques pour la reconnaissance de la parole
- Loïc Barrault ( Laboratoire Informatique d'Avignon)
- Driss Matrouf ( Laboratoire Informatique d'Avignon)
- Georges Linarès ( Laboratoire Informatique d'Avignon)
- Renato De-Mori ( Laboratoire Informatique d'Avignon)
- Résumé : With the purpose of improving Automatic Speech Recognition (ASR) systems performance, many different approaches on combining them have been largely studied. In this paper, a combination of state a posteriori probabilities given by different feature sets is proposed. In order to perform a coherent combination of state posterior probabilities, the acoustic models trained on different feature sets must have the same topology (i.e. same set of states). For this purpose, a fast and efficient twin model training protocol is proposed. Two different strategies for combining probabilities are presented : the linear and the log linear interpolation. By using log linear interpolation, a relative Word Error Rate (WER) reduction of about 15% and 14% have been observed respectively on MEDIA and ESTER corpora.
- article
Session JEP poster P3 Mardi 10 Juin - 14h00 16h00
-
papier 1622
Combinaison de systèmes par décodage guidé
- Benjamin Lecouteux ( LIA, Avignon)
- Georges Linarès ( LIA, Avignon)
- Yannick Estève ( LIUM, Le Mans)
- Guillaume Gravier ( IRISA, Rennes)
- Résumé : In this paper, we propose an integrated approach for system combination named Driven Decoding Algorithm (DDA). It consists in guiding the search algorithm of a primary ASR system by the outputs of an auxiliary system. We first evaluate this method in simple configuration in which the primary search is driven by the one-best hypothesis of a single auxiliary system. Then, we generalize DDA to confusion-network driven decoding and we propose a general combination schemes for multiple system combination. The proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized-DDA outperforms significantly ROVER method: we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.
- article
Session JEP poster P3 Mardi 10 Juin - 14h00 16h00
-
papier 1657
Adaptation rapide de modèles acoustiques compacts
- Christophe Lévy ( Université d'Avignon et des Pays de Vaucluse)
- Georges Linarès ( Université d'Avignon et des Pays de Vaucluse)
- Jean-François Bonastre ( Université d'Avignon et des Pays de Vaucluse)
- Résumé : In a previous work we presented a new architecture dedicated to embedded speech recognition. It relies on a general GMM, which represents the whole acoustic space, associated with a set of HMM state-dependent probability functions modeled as transformations of this GMM. This work takes advantage of this architecture to propose a fast and efficient way to adapt the acoustic models. The adaptation is performed only on the general GMM model and does not require state-dependent adaptation data. It is also very efficient in terms of computational cost. We evaluate our approach in the voice-command task. This adaptation method achieved a relative error-rate decrease of about 10% even if few adaptation data are available.
- article
Session JEP orale O3 Parole spontanée et interaction Mercredi 11 Juin - 10h30 12h30
-
papier 1616
Caractérisation et détection de parole spontanée dans de larges collections de documents audio
- Vincent Jousse ( Laboratoire d'Informatique de l'Université du Maine (LIUM))
- Yannick Estève ( Laboratoire d'Informatique de l'Université du Maine (LIUM))
- Frédéric Béchet ( Laboratoire d'Informatique d'Avignon (LIA))
- Thierry Bazillon ( Laboratoire d'Informatique de l'Université du Maine (LIUM))
- Georges Linarès ( Laboratoire d'Informatique d'Avignon (LIA))
- Résumé : Processing spontaneous speech is one of the many challenges that ASR systems have to deal with. The main evidences characterizing spontaneous speech are disfluencies (filled pause, repetition, repair and false start) and many studies have focused on the detection and the correction of these disfluencies. In this study we define spontaneous speech as unprepared speech, in opposition to prepared speech where utterances contain well-formed sentences close to those that can be found in written documents. This paper proposes a set of acoustic and linguistic features that can be used for characterizing and detecting spontaneous speech segments from large audio databases.
- article
Session JEP orale O4 Reconnaissance de la parole et du locuteur Jeudi 12 Juin - 14h00 16h00
-
papier 1574
Enrichissement dynamique du vocabulaire à partir du Web
- Stanislas Oger ( Université d'Avignon)
- Georges Linarès ( Université d'Avignon)
- Frédéric Béchet ( Université d'Avignon)
- Pascal Nocéra ( Université d'Avignon)
- Résumé : Most of the Web-based methods for lexicon augmenting consist in capturing global semantic features of the targeted domain in order to collect relevant documents from the Web. We suggest that the local context of the out-of-vocabulary words contains relevant information on the OOV words. With this information, we propose to use the Web to build locally-augmented lexicons which are used in a final local decoding pass. We first demonstrate the relevance of the Web for the OOV word retrieval. Then, different methods are proposed to retrieve the hypothesis words. Finally we present the integration of new words in the transcription process based on part-of-speech models. This technique allows to recover 7.6% of the significant OOV words and the accuracy of the system is slightly improved.
- article