Session JEP poster - P4

Mardi 10 Juin - 14h00 16h00

papier 1643 Identification des émotions en voix naturelle et synthétique : paradigme d'ancrage

Ioulia Grichkovtsova  ( Université de Caen)

Michel Morel  ( Université de Caen)

Anne Lacheret-Dujour  ( Université Paris X)

Résumé : The two main objectives of the study were to identify the perceptive role of intonation and voice quality in the identification of emotions and to determine gating points for each studied emotion. A multi-speaker corpus was used for the development of a new perception test on the basis of gating paradigm and transplantation paradigm. The results of the study are discussed in the light of their relevance for the voice synthesis of affective speech.

article

papier 1646 Caractérisation des zones d'interactivité entre locuteurs : vers la détection de zones de parole conversationnelle

Benjamin Bigot  ( IRIT - Université Paul Sabatier)

Isabelle Ferrané  ( IRIT - Université Paul Sabatier)

Ibrahim Zein-Al-Abidin  ( IRIT - Université Paul Sabatier)

Résumé : La parole conversationnelle est au centre du projet EAC auquel nous participons. Afin de permettre le développement d'outils de traitement de la parole conversationnelle et d'annotation enrichie de masses de données audio, nous avons centré nos travaux sur la détection et la caractérisation de zones de parole conversationnelle. Pour aborder ce problème, nous avons adopté une approche fouille de données et nous utilisons une méthode basée sur l'analyse des relations temporelles. Cette méthode permet de détecter des zones ou deux carcatéristiques sont actives et pour caractériser au mieux ces zones, nous définissons un ensemble de descripteurs complémentaires. Dans le cadre du projet EPAC, ces descripteurs mettent en avant des informations intéressantes sur le profil des locuteurs ce qui donne des indications sur leur rôle potentiel dans le document audio.

article

papier 1650 Une approche basée voyelle pour la reconnaissance d'émotions actées

Fabien Ringeval  ( Institut des Systèmes Intelligents et Robotique - UPMC Paris 6)

Mohamed Chetouani  ( Institut des Systèmes Intelligents et Robotique)

Résumé : Nous présentons une étude sur un nouveau schéma d'extraction de caractéristiques pour la reconnaissance automatique d'émotions actées. Des unités pseudo-phonétiques (voyelles et consonnes) sont automatiquement extraites à partir d'un signal de parole. Elles sont ensuite évaluées en termes de détection, puis comparées sur les plans acoustiques et prosodiques aux segments voisés dans une tache de reconnaissance d'émotions sur le corpus Berlin. Les résultats obtenus mettent en lumière le poids des voyelles dans la perception des émotions puisque les scores obtenus sont meilleurs que ceux basés sur les segments voisés.

article

papier 1676 LUNA : Compréhension en contexte pour le dialogue oral

Géraldine Damnati  ( France Telecom R&D)

Frédéric Béchet  ( Université d'Avignon)

Renato De-Mori  ( Université d'Avignon)

Résumé : This paper describes the first results achieved within the LUNA project in coupling the Spoken Language Understanding process with the Automatic Speech Recognition and Dialog Manager processes. This strategy is implemented and evaluated on a France Telecom telephone service application called FT3000.

article

papier 1682 Analyse sémantique des énoncés oraux arabes dans un contexte de dialogue homme-machine

Younès Bahou  ( Laboratoire LARIS-MIRACL, FSEGS Université de Sfax)

Houssem Safi  ( Laboratoire LARIS-MIRACL, FSEGS Université de Sfax)

Lamia Hadrich-Belguith  ( Laboratoire LARIS-MIRACL, FSEGS Université de Sfax)

Résumé : In this paper we present the ASAR system of semantic analysis of Arabic speech statements in a context of human-machine oral dialogue. The aim of this is to generate several semantic representations of an Arabic statement. ASAR is based on a method of analysis strongly guided by the semantics and using the formalism of case grammars. This method consists of three main stages namely, a first step of pretreatment and semantic tagging, a second step for semantic pattern identification and filtring and a step of pattern generation.

article

papier 1584 Perception multimodale de l'anticipation en parole chez le sourd et l'entendant, adulte et enfant

Emilie Troille  ( CRI - Université Stendhal / GIPSA-Lab-INPG-Université Stendhal)

Marie-Agnès Cathiard  ( CRI - Université Stendhal)

Lucie Ménard  ( Laboratoire de phonétique - UQAM)

Denis Beautemps  ( GIPSA-Lab-INPG-Université Stendhal)

Résumé : In a previous study (Troille & Cathiard, 2006) on vowel rounding anticipatory perception through a fricative consonant stream, we demonstrated that auditory information could be, in this VCV frame, ahead of visual information, and even of audiovisual information. We propose here to extend this experiment to 7-11-year-old children and to adults, both with normal and impaired hearing. Results showed that, if normal and impaired hearing subjects, both adults and children, had the same late performance in vision only, impaired hearing practicing French Cued Speech succeeded in the identification task as precociously as normal hearing in the auditory condition. Hence given the best available information, provided it anticipates on vision, be it the audio channel or manual CS coding, subjects take the best advantage of the natural timing of the different sensory informations. So the hand can substitute in time for the spoken sounds.

article

papier 1587 Emergence du langage par jeux déictiques dans une société d'agents sensori-moteurs en interaction.

Clément Moulin-Frier  ( Grenoble-INP)

Jean-Luc Schwartz  ( Grenoble-INP)

Julien Diard  ( Université Pierre Mendes France)

Pierre Bessière  ( Université Joseph Fourier)

Résumé : Dans cet article, nous montrons comment certaines propriétés du langage humain peuvent émerger d'une fonction plus primitive de déixis (action de montrer les choses). Pour cela, nous modélisons une société d'agents sensori-moteurs capables de produire des vocalisations et de pointer des objets dans leur environnement. Nous montrons alors comment certains principes de la Théorie de la Dispersion (Lindblom, 1972) et de la Théorie Quantique (Stevens, 1989) peuvent émerger de ces interactions entre agents.

article

papier 1669 Analyse des scores imposteurs d'un Système de VAL GMM-UBM

Salah-Eddine Mezaache  ( Laboratoire d'Informatique d'Avignon (LIA))

Driss Matrouf  ( Laboratoire d'Informatique d'Avignon (LIA))

Jean-François Bonastre  ( Laboratoire d'Informatique d'Avignon (LIA))

Résumé : In this paper, we present an analysis of the problem of impostors trials with high scores in the context of NISSRE- 2006 eval [4]. Trials are based on LIA-GMMUBM refernce system [5]. We propose a method to deal with such trials called REVERSE method. Thus, less than a 1% trials on the NIST-2006 raise the DCFmin of 40%. Our motivation were to perform investigation on impostor scores, attempt to understand

article

papier 1596 Caractéristiques articulatoires des consonnes de liaison : Etude pilote.

Céline Douchez  ( Laboratoire Parole et Langage)

Léonardo Lancia  ( Laboratoire Parole et Langage)

Résumé : Ce papier présente une nouvelle approche de la liaison en français. Son objectif est d'étendre la comparaison entre les consonnes d'attaque et de liaison à de nouveaux paramètres acoustiques et articulatoires afin d'étudier le couplage de ces consonnes avec les voyelles adjacentes. Les résultats montrent que le couplage des consonnes de liaison avec les voyelles attenantes diffère de celui des consonnes d'attaque. Nous suggérons que ces différences peuvent être expliquées par un modèle phonologique qui présente une conception unifiée entre phonologie et phonétique : la Phonologie Articulatoire.

article

papier 1599 Intégration de sources multiples d'information dans l'identification lexicale et l'acquisition de nouvelles représentations lexicales

Odile Bagou  ( Université de Genève. Université de Neuchâtel)

Ulrich Frauenfelder  ( Université de Genève)

Résumé : This study investigates how French listeners exploit phonological and phonetic cues in segmenting continuous speech into words. We examined how these listeners integrate multiple sources of information not only in lexical identification, using the word spotting task, but also in the storage of new lexical representations, using an artificial language learning task. Results showed that the specific segmentation cues examined had different weights in these two tasks. Syllable onsets, simultaneously cued by allophonic variations and phonotactics, played a predominant role in lexical identification while stress was a “last-resort” segmentation cue. In contrast, rhythmic information, particularly primary stress, played a greater role in lexical acquisition while syllable onsets were not used as segmentation cues. These results suggest that caution is required in relating results on lexical acquisition and lexical identification.

article

papier 1663 Les voyelles /i/ et /y/ du français : focalisation et variations formantiques

Cédric Gendrot  ( L.P.P UMR 7018)

Martine Adda-Decker  ( L.I.M.S.I.)

Jacqueline Vaissière  ( L.P.P UMR 7018)

Résumé : French vowels /i/ and /y/ were noticed to be less subject to variation on F1/F2 plan compared to the other French vowels [7,11]. Quantal Theory of Speech predicts that these two vowels are focal and thus characterized by the proximity of two of their formants (respectively F2/F3 for /y/ and F3/F4 for /i/). Higher formants values F3 and F4 as well as their amplitude are investigated in this study in order to show that French /i/ and /y/ have more compact F3/F4 and F2/F3 as compared to seven other languages. We also suggest that these higher formants allow better understanding for the variation of these vowels. We underline the importance of F3 measurements to characterize the variations of these vowels, notably due to their difference in rounding.

article

papier 1666 Inversion acoustique-articulatoire dynamique par codebook hypercuboïque : premiers résultats

Blaise Potard  ( LORIA - Université Nancy 1)

Résumé : Our goal is to recover articulatory information from the speech signal by acoustic-to-articulatory inversion. Like most inversion methods proposed in the literature, our method relies on the analysis-by-synthesis paradigm, here based on Maeda's articulatory model. After an overall description of the inversion method the paper presents a few inversions of formants frequencies trajectories obtained from synthesizing articulatory data and compare the obtained articulatory trajectories to the original.

article