Guillaume Gravier

IRISA, Rennes

Session TALN orale O2 Extraction d'information Lundi 9 Juin - 13h30 15h00

papier 1332 Un modèle multi-sources pour la segmentation en sujets de journaux radiophoniques

Stéphane Huet  ( IRISA, Université de Rennes 1)

Guillaume Gravier  ( IRISA, CNRS)

Pascale Sébillot  ( IRISA, INSA de Rennes)

Résumé : Nous présentons une méthode de segmentation de journaux radiophoniques en sujets, basée sur la prise en compte d'indices lexicaux, syntaxiques et acoustiques. Partant d'un modèle statistique existant de segmentation thématique, exploitant la notion de cohésion lexicale, nous étendons le formalisme pour y inclure des informations d'ordre syntaxique et acoustique. Les résultats expérimentaux montrent que le seul modèle de cohésion lexicale ne suffit pas pour le type de documents étudié en raison de la taille variable des segments et de l'absence d'un lien direct entre segment et thème. L'utilisation d'informations syntaxiques et acoustiques permet une amélioration substantielle de la segmentation obtenue.

article

Session JEP poster P3   Mardi 10 Juin - 14h00 16h00

papier 1622 Combinaison de systèmes par décodage guidé

Benjamin Lecouteux  ( LIA, Avignon)

Georges Linarès  ( LIA, Avignon)

Yannick Estève  ( LIUM, Le Mans)

Guillaume Gravier  ( IRISA, Rennes)

Résumé : In this paper, we propose an integrated approach for system combination named Driven Decoding Algorithm (DDA). It consists in guiding the search algorithm of a primary ASR system by the outputs of an auxiliary system. We first evaluate this method in simple configuration in which the primary search is driven by the one-best hypothesis of a single auxiliary system. Then, we generalize DDA to confusion-network driven decoding and we propose a general combination schemes for multiple system combination. The proposed extended DDA is evaluated using 3 ASR systems from different labs. Results show that generalized-DDA outperforms significantly ROVER method: we obtain a 15.7% relative word error rate improvement with respect to the best single system, as opposed to 8.5% with the ROVER combination.

article

Session JEP poster P3   Mardi 10 Juin - 14h00 16h00

papier 1623 Vers une adaptation thématique non supervisée de modèles de langage : utilisation d'Internet comme un corpus ouvert

Gwénolé Lecorvé  ( Irisa, INSA de Rennes)

Guillaume Gravier  ( Irisa, CNRS)

Pascale Sébillot  ( Irisa, INSA de Rennes)

Résumé : Since language models (LM) of automatic speech recognition systems are usually trained on multi-topic corpora, topic adaptation has been shown to be an effective way to improve the recognition accuracy, especially for broadcast news. This paper presents a new complete and unsupervised technique using information retrieval methods and based on the use of the Internet to retrieve thematically coherent corpora from which adapted LMs are trained. Experimental results demonstrate the validity of the proposed adaptation method with significant perplexity and word error rate reductions, and also show that topic adaptation should be included early in the recognition process.

article

Guillaume Gravier

IRISA, Rennes

Session TALN orale O2 Extraction d'information Lundi 9 Juin - 13h30 15h00

Session JEP poster P3 Mardi 10 Juin - 14h00 16h00

Session JEP poster P3 Mardi 10 Juin - 14h00 16h00