Arnaud Delhay
IRISA / Université de Rennes 1 - ENSSAT
Session JEP poster P2 Lundi 9 Juin - 16h00 18h00
-
papier 1642
Evaluation de méthodes de réduction de corpus linguistiques
- Nelly Barbot ( IRISA / Université de Rennes 1 - ENSSAT, Lannion)
- Pierre Alain ( IRISA / Université de Rennes 1 - ENSSAT, Lannion)
- Olivier Boeffard ( IRISA / Université de Rennes 1 - ENSSAT, Lannion)
- Jonathan Chevelu ( IRISA / Université de Rennes 1 - ENSSAT, Lannion)
- Arnaud Delhay ( IRISA / Université de Rennes 1 - ENSSAT)
- Résumé : This article deals with covering methodologies in the context of automatic speech processing technologies. More precisely, we are interested in covering phonological attributes of a linguistic corpus under the constraint of a minimal duration. This goal is classically achieved by greedy algorithms which however do not guarantee the optimality of the solutions. We propose to compare the results of a new algorithm, the LamSCP, that calls upon the principles of lagrangian relaxation, and an agglomeration-spitting greedy algorithm to achieve an optimal covering. We conducted experiments on the Gutenberg corpus considering, phone, diphone and triphone optimal covering. The LamSCP provides better solutions than the greedy algorithm and enables to locate their quality by offering a lower bound to the optimization problem.
- article