Viet-Bac Le
Laboratoire Informatique de Grenoble
Session JEP orale O1 Diversité des langues Lundi 9 Juin - 13h30 15h30
-
papier 1624
Reconnaissance automatique de la parole en langue khmère : quelles unités pour la modélisation du langage et la modélisation acoustique?
- Sopheap Seng ( Laboratoire Informatique de Grenoble)
- Sethserey Sam ( Laboratoire Informatique de Grenoble)
- Viet-Bac Le ( Laboratoire Informatique de Grenoble)
- Brigitte Bigi ( Laboratoire Informatique de Grenoble)
- Laurent Besacier ( Laboratoire Informatique de Grenoble)
- Résumé : In this paper we present an overview on the development of a large vocabulary continuous speech recognition system for Khmer language. Methods and tools used for language resources collection for quick development of an ASR system for a new under-resourced language are presented. Face with the problem of lack of text data and the word error segmentation in language modeling, we investigate how different views of the text data (word and sub-word units) can be exploited for Khmer language modeling. We propose to work both at the model level (by making hybrid vocabularies with both word and sub-word units) as well as at the ASR output level (systems combination). For acoustic modeling, we use basic linguistic rules to automatically generate pronunciation dictionaries based on grapheme or phoneme. An experimental framework is setup to evaluate the performance of each modeling units.
- article