Irisa, INSA de Rennes
Session JEP poster P3 Mardi 10 Juin - 14h00 16h00
Vers une adaptation thématique non supervisée de modèles de langage : utilisation d'Internet comme un corpus ouvert
- Gwénolé Lecorvé ( Irisa, INSA de Rennes)
- Guillaume Gravier ( Irisa, CNRS)
- Pascale Sébillot ( Irisa, INSA de Rennes)
- Résumé : Since language models (LM) of automatic speech recognition systems are usually trained on multi-topic corpora, topic adaptation has been shown to be an effective way to improve the recognition accuracy, especially for broadcast news. This paper presents a new complete and unsupervised technique using information retrieval methods and based on the use of the Internet to retrieve thematically coherent corpora from which adapted LMs are trained. Experimental results demonstrate the validity of the proposed adaptation method with significant perplexity and word error rate reductions, and also show that topic adaptation should be included early in the recognition process.