titre: Apprentissage des langues
auteurs: Valérie Hazan
mots clés: langues, apprentissage
abstract: The goal of this paper is to briefly review the major impact of speech technology in the area of second language learning and to present recent developments in the area of phonetic acquisition. Second language learners are ‘deaf’ to many sound distinctions that do not occur in their first language. Key issues are whether it is possible to improve perception and production via training and whether the use of speech technology is successful in promoting acquisition. Results of recent studies on the role of speech enhancement and of visual cues in increasing the effectiveness of language training will also be presented.

titre: Experiments on cross-language acoustic modeling
auteurs: T. Schultz, A. Waibel
mots clés:
abstract: With the distribution of speech products all over the world, the portability to new target languages becomes a practical concern. As a consequence our research focuses on rapid transfer of LVCSR systems to other languages. In former studies we evaluated the performance if limited adaptation data is available. Particularly for very time constrained tasks and minority languages, it is even reasonable that no training data is available at all. In this paper we examine what performance can be expected in this scenario. All experiments are run in the framework of the GlobalPhone project which investigates LVCSR systems in 15 languages.

titre: Optimisation d'arbres de décision pour la conversion graphèmes-phonèmes
auteurs: H. Crépy, C. Amato-Beaujard, J.C. Marcadet, C. Waast-Richard
mots clés: synthèse, graphème-phonème
abstract: Extensive experiments on a data-driven decision-tree technique for French grapheme-to-phoneme conversion are dedicated to studying the effects of various treegrowing parameters as well as features and questions selection. Generated phonetic transcriptions of unknown words are used for speech recognition and synthesis. We report surprisingly good results, with recognition error rates better than with rule-generated transcriptions, and only slightly worse than with reference man-made transcriptions, and transcription phonetic error rates measuring as low as 1.56%, thanks in part to the introduction of POS tags into the context features.

titre: Traitement des incises en français : capture automatique et modèle prosodique
auteurs: Philippe Boula de Mareüil, Estelle Maillebuau
mots clés: prosodie
abstract: Parentheticals in French are investigated, in order to assign them a specific prosody in text-to-speech synthesis. On the basis of lexical-syntatic and punctuational criteria, we show that it is possible to detect them automatically by a regular grammar, with an f-measure of more than 92%, on a corpus of more than 3000 newspaper sentences, containing about 200 parentheticals. A 20% reduction of pitch range and of average pitch, as well as a 2dB reduction of energy (respectively, a 10% reduction of pitch range) may then be applied to non-final (respectively final) parentheticals, which reflects observations made on a professional female speaker.

titre: Introduction de l'énergie dans un modèle de reconnaissance automatique de la parole
auteurs: Abdellah Yousfi, Abdelouafi Meziane
mots clés: acoustique, Hidden Markov Model (HMM), Energy, Two Level Hidden Semi Markov Centisecond Model (TLHSMCM).
abstract: A major deficiency of standard Hidden Markov Models (HMM) is that both the spectral and the prosodic feature are uniformly processed. To combine more efficiently the prosodic cues with the acoustic ones, a segmental two Level Hidden Markov Model has been recently studied by suaudeau [Suaudeau 94]. In this paper, we present an adapted version of this model in wich the segmental processing is replaced by the classical centisecond processing. This new model is called Two Level Hidden Semi Markov Centisecond Model (TLHSMCM)). Our approach retains the traditionnal hierarchical structure of an HMM, and facilitate the introduction of others prosodic parameters (in particular the energy) in the phonetic level. Experiments on a french database composed of 20 numbers show that this model reduces the recognition error rates.

titre: Compalex : un outil d'analyse dialectométrique pour une comparaison phono-lexicale synchronique des parlers d'une zone géographique
auteurs: Ndamba Josué
mots clés: apprentissage Langue
abstract: This paper presents the software "Compalex" that processes lexical data of two or several languages (or dialects) of a geographical area in view to determine the degree of intelligibility that exist between they. Softwares that exist nowadays calculate the common root percentage between languages. Thereby results show far more historical relations between the dialects or languages. Compalex processes both common root percentages between languages and the sounds that these common roots share. Thereby, results give a more reliable indication about the way speakers of these different languages understand mutually. This software runs under Windows 95 or later version.

titre: Modélisation d'un système de reconnaissance pour l'apprentissage automatique de stratégies de dialogue optimales
auteurs: Olivier Pietquin, Thierry Dutoit
mots clés: reco, reconnaissance de parole,Dialogue
abstract: This last decade, the field of spoken dialogue systems has developed quickly. However, rapid design of dialogue strategies remains uneasy. Automatic strategy learning has been investigated and the use of Reinforcement Learning algorithms introduced by Levin and Pieraccini is now part of the state of the art in this area. Obviously, the learned strategy's worth depends on the definition of the optimization criterion used by the learning agent and on the exactness of the environment model. In this paper, we propose to introduce a model of an ASR system in the simulated environment in order to enhance the learned strategy. To do so, we brought recognition error rates and confidence levels produced by ASR systems in the optimization criterion.

titre: La syllabe comme unité de perception de la parole : un état de la question
auteurs: Alain Content, Uli H. Frauenfelder
mots clés: phonétique-phonologie
abstract: One highly influential finding that suggests that syllabic units are instrumental in speech perception is the crossover interaction between target type and word type observed in the sequence detection task. In this paper we review our recent studies with French speakers using the same task. Overall the findings fail to replicate the "syllable effect" and indicate that the observed effects are primarily due to the time course of the arrival of phonetic information in the carrier stimuli. These data argue against an early syllabic classification mechanism in speech perception, but other results that we have obtained suggest an important role of syllable structure and more specifically onsets in speech segmentation.


titre: L'effet syllabique dans les mots et les pseudo-mots en français
auteurs: Grégory Leclercq, Alain Content, Uli H. Frauenfelder
mots clés: phonétique-phonologie
abstract: Two syllable detection experiments were conducted to compare word and pseudoword carriers. The syllabic effect was found neither with words nor with pseudowords. Regression analyses were run to examine the influence of phonetic throughput on detection times. A contribution of the syllabic structure of the carriers was only found for the CV targets whereas contributions of the temporal localisations of the first vowel and of the pivotal consonant were found for the CVC targets. The results support the view that for both words and pseudowords, the pattern of results stems from the combination of two distinct effects, and does not reflect the use of a perceptual syllabic code.

titre: Etude comparative de vocalisations de bébés humains et de bébés robots
auteurs: J. Serkhane, J.L. Schwartz, L.J. Boë, B. Davis, P. Bessière, E. Mazer
mots clés: phonétique-phonologie, acoustique
abstract: In order to assess infants motor skills during speech development, we used a statistical model of the vocal tract that integrates growth of the effector system. This model allowed us to infer, from real vocalizations, the likeliest explored acoustic regions , articulatory degrees of freedom and vocal tract shapes , and to test MacNeilage and Davis cooccurrence hypothesis. Our results will feed the building of a virtual robot, modelling speech development.

titre: Gabarits des tons vietnamiens
auteurs: Pham Thi Ngoc Yen, Eric Castelli, Nguyen Quoc Cuong
mots clés: phonétique-phonologie, prosodie
abstract: A 135 word corpus uttered by 16 different speakers was build in order to study the shape of 6 Vitenamese tones. The wavelet method is used to extract the pitch (F0) from a speech signal corpus General shapes are extracted for each speaker, which will be useful for automatic recognition or for synthesis, and comparisons between men and women show that we can consider no important difference between them. However, we have to separate North speakers from Centre/South speakers.

titre: Interface syntaxe-prosodie dans un système de synthèse de la parole à partir du texte en arabe
auteurs: S. Baloul, M. Alissali, M. Baudry, P. Boula de Marüeil
mots clés: prosodie, synthèse
abstract: This paper presents a syntactico­prosodic model and its implementation in a diphone Arabic text­to­speech (TTS) system. This model, based on rewrite rules, first calculates the syntactic markers of the input text. Then, a phrasing operation segments it into chunks. The syntax­prosody interface then enables the allocation of pauses and the generation of prosodic parameters: the melodic contour depends on the sentence modality, on the word position within chunks and on the chunk position within the sentence. The implemented modules are curently being evaluated within a global evaluation of a multilingual TTS system.

titre: Tu pourrais enregistrer un corpus pour moi ?
auteurs: Alexis Michaud
mots clés: Enregistrement de corpus
abstract: The time-consuming task of archiving and disseminating data is not a priority with most phoneticians. As a result, finding a suitable ready-made corpus is no easy task; researchers often rely on corpora of questionable value. Looking back at a century of speech recording, the legacy is not as extensive-and nowhere as tidy-as the layman would think. This paper calls for a " Corpus quality standard ". The argument (based on detailed examples) is that small-scale programs adhering to simple standards can actually go to build the databases we need. A quality standard would make data publication easier (thus fostering research) and allow for a smoother transition into the shelves of libraries, fulfilling the phonetician's key role in documenting the languages of the world. ...

titre: Le e d'appui parisien : statut actuel et progression
auteurs: CANDEA, Maria
mots clés: phonétique-phonologie
abstract: This paper studies the hypothesis of the progression, during the last decade, of an oral phenomenon typical for the spontaneous french spoken in the Great Paris area : the epithetical "e" (eg. bonjour-e, insert in final position, with falling intonation). Our study is based on a comparison between the characteristics of a recently acquired french corpus and the results of two previous studies. It aims to describe the evolution of this phenomenon in real time (1989 vs. 1997/8) as well as in apparent time (adultes 1997/8 vs. teenagers 1997). We show that the indicators studied here are clearly in progress in real and apparent time, allowing to hypothesise that the mentioned phenomenon is still continuing his expansion.

titre: Extraction de caractéristiques par codage neuro-prédictif
auteurs: M. Chetouani, B. Gas, J.L. Zarader, C. Chavy
mots clés: analyse, extractions de caractéristiques, codage neuro-prédictif
abstract: In this paper, we present a predictive neural network called Neural Predictive Coding (NPC). This model is used for non linear discriminant features extraction (DFE) applied to phoneme recognition. We also, present an extension of the NPC model : NPC-3. In order to evaluate the performances of the NPC-3 model, we carried out a study of Darpa-Timit phonemes (in particular /b/, /d/, /g/ and /p/, /t/, /q/ phonemes) recognition. Comparisons with traditional coding methods are presented (LPC, MFCC and PLP) : they put in obviousness an improvement of the classification.

titre: Stratégies perceptives en identification des langues
auteurs: Ioana VASILESCU
mots clés: acquisitionLangue, Identification des langues
abstract: This paper deals with perceptual strategies in language identification. The study of strategies employed by humans to identify foreign languages is currently considered as a comparative approach in evaluating automatic performances. We present a survey of the domain and suggest a methodology aiming to control the factors responsible of the identification scores, i.e. experimental design, corpus and listeners' linguistic background. Two experimental designs are conducted (language discrimination vs. evaluation of the similarity) to determine the strategies developed by 4 populations to identify Romance languages (French, Spanish, Italian, Portuguese, Romanian). A case study highlights the main identification strategies (vocalic complexity vs. previous exposure to the languages).

titre: Traitement des mots mal reconnus en compréhension de la parole
auteurs: Caroline Bousquet-Vernhettes
mots clés: reco, applications, compréhension de la parole - robustesse
abstract: The aim of this paper is to propose an extension of the stochastic conceptual modeling to increase the robustness of the understanding process faced with misrecognitions and unknown words. Corpus analysis shows that some misrecognised words are more difficult to interpret than others, so we defined a word ambiguity rate. We performed trial series on train schedule inquiry application to evaluate the understanding rate when faced with misrecognised words and in particular, when these words are city names.

titre: Evaluation psycholinguistique de l'effet du vieillissement sur la production des noms propres
auteurs: EVRARD Muriel
mots clés: production, psycholinguistique, accès lexical, noms propres, noms communs, vieillissement, tâche de fluence verbale
abstract: The impact of age on proper names and common nouns production ability was investigated using a task of verbal fluency in 87 healthy adults from four age groups (“young”, “middle-aged”, “fairly-old”, “very-old”). Participants had to generate in one minute as many words as possible belonging to each of three semantic categories: celebrities (generation of names of people), countries (names of places) and fruits (common nouns). Word access ability, as measured by number of successful retrievals, declined with age more for names of people than for other words. This result supports a disproportionate difficulty with age in retrieving the names of people and is interpretated in reference with the cognitive model of Burke et al.

titre: Identification des consonnes du français en syllabe isolée après laryngectomie partielle supracricoïdienne
auteurs: Lise Crevier-Buchman, Stéphane Hans, Jacqueline Vaissière, Shinji Maeda, Daniel Brasnu
mots clés: perception, pathologies, consonnes, matrices de confusions, laryngectomie partielle, voix de substitution
abstract: This study aimed to determine what patterns of perceptual confusions characterise the voice of patients after supracricoïd partial laryngectomy (SCPL) by the identification tests of French consonants. After SCPL, voice is produced by a neoglottis located at approximately 3 cm above the removed vocal folds, thus shortening the vocal-tract length. We first evaluated the voicing distinction, as their vibrator is profoundly modified, and second manner and place of articulation features as their vocal tract is shortened by about 3 cm. Ten male patients were recorded 18 months after SCPL producing 16 French consonants in a syllabic context (CV). Consonant articulation appears to impose certain constraints on voicing ability of SCPL patients, since voiced consonants are predominantly perceived as voiceless consonants.

titre: Fusion de Paramètres Rythmiques et Segmentaux Pour l'Identification Automatique des Langues
auteurs: Jean-Luc ROUAS, Jérôme FARINAS, François PELLEGRINO, Régine ANDRE-OBRECHT
mots clés: reconnaissance de langue, prosodie, identification de la langue
abstract: This paper deals with an approach to Automatic Language Identification based on rhythmic modeling and vowel system modeling. Experiments are performed on read speech for 5 European languages. They show that rhythm and stress may be automatically extracted and are relevant in language identification: using cross-validation, 78% of correct identification is reached with 21 seconds utterances. The Vowel System Modeling, tested in the same conditions (cross-validation), is efficient and results in a 70% of correct identification for the 21 seconds utterances. Last, merging the output scores from the two models improves the results : with only 11 seconds test excerpts, the correct identification rate is over 80%.

titre: Tentative de formalisation algorithmique de la démarche du phonologue Un outil d'aide à la formulation d'hypothèses phonologiques
auteurs: Michel Jacobson
mots clés: phonétique-phonologie
abstract: We present a formal computerized model of a particular linguistic theory, functional phonology ­­ a theory which is often criticized precisely for its lack of formalization. This theory proposes on the one hand a general framework for the expression of phonological phenomena and on the other a model for a discovery procedure for phonological units. In formalizing this theory explicitly, we have arrived at (1) a formalism for the expression of data and hypotheses and (2) a computer program emulating the functionalist methods of phonological analysis. In the paper, we present the principal data structures used and the procedures which we have designed to process them. Methodological obstacles which we have faced in implementing the model are discussed.

titre: Identification des locuteurs par regroupement hiérarchique ascendant et modèles d'ancrage
auteurs: Yassine Mami, Delphine Charlet
mots clés: Reconnaissance du locuteur
abstract: The process of speaker recognition is generally based on modeling the characteristics of each speaker. An interesting method for modeling consists in representing a new speaker, not in an absolute manner, but relatively to a set of well trained speakers. Each speaker is represented by its location in an optimal space of eigen or virtual voices. We hope that the relative position of a speaker in this space of virtual speakers is invariant whatever the conditions of sound recording and the content of sentences are. This paper describes a representation space built by clustering speakers and how we can locate a speaker by using anchor models. The paper also presents experimental results and compares with GMM. We show that clustering gives an optimal space. If we have a few amount of training data, we also show that our system gives better performances.

titre: Contrôle de l'anticipation vocalique d'arrondissement en Langage Parlé Complété
auteurs: Virginie Attina, Marie-Agnès Cathiard, Denis Beautemps
mots clés: production, modèlesLangage, phonétique-phonologie
abstract: "Langage Parlé Complété (LPC)" is the French manual system - corresponding to Cued Speech - used to complement lip reading and thus to enhance speech perception for hearing-impaired people. In an anticipatory rounding context, a French speaker was audiovisually recorded pronouncing and coding [i#yi] sequences with two different pause durations. The relative timing of the hand and lip movements and of the corresponding acoustic signal was quantified. The results showed that : (i) the manual cue follows the temporal organization of visible speech; (ii) the manual target position is always ahead of the corresponding lip target.

titre: Organisation spatio-temporelle main - lèvres - son de séquences CV en langage parlé complété
auteurs: V. Attina, D. Beautemps, M.-A. Cathiard
mots clés: production, modèlesLangage, phonétique-phonologie
abstract: This study was designed to investigate the coordinations in space and time between manual and oro-facial gestures involved in “Langage Parlé Complété”, an efficient method of communication by hearing-impaired people. Cued CV syllabic sequences were analysed. Results showed (i) five distinct positions for vowels and (ii) manual anticipation with respect to lip movements and sound, manual information being delivered at the beginning of a CV syllable.

titre: Sexe, mensonges et F0
auteurs: Estelle Campione, Jean Véronis
mots clés: phonétique-phonologie
abstract: Many contradictory results have been published on male and female voice characteristics, and the debate was sometimes tinged by sexist stereotypes. In a de­ tailed study, Tielen [Tie92] seemed to partly conclude the debate by showing that there is no difference in F 0 range among sexes. We show in this paper that her conclusion was mislead by the measure she and many other researchers use (the 90 range), which precisely erases the differences to be observed. We show on a large multilingual corpus involving 60 different speakers, that there are indeed strong differences in the shape of F 0 distribution between sexes. Female voices show high values of skewness and kurtosis, characteristic of long tails in the distribution, whereas no such tendency can be observed for men.

titre: Évolution des structures de l'oral en formation de formateurs de FLE
auteurs: Véronique Delplancq, Bernard Harmegnies
mots clés: phonétique-phonologie, applications, apprentissageLangue, psycholinguistique, production
abstract: The paper is focused on the evolution of the second language mastery in Portuguese students enrolled in a 4-year course for future teachers of French. During the whole learning period, they have been regularly recorded, and acoustical analyses have been performed on their utterances of the French /i/, /y/ and /u/ vowels. The language acquisition profiles are related with the students involvement in actual communicative activities in the target language prior to their enrolment.

titre: Contraintes de contrôle articulatoire intrasyllabique dans la mémoire de travail verbale
auteurs: Sato M., Schwartz J.L., Cathiard M.A., Abry C., Loevenbruck H.
mots clés: psycholinguistique, production, perception, phonétique-phonologie, Mémoire de travail - boucle phonologique
abstract: Verbal transformation effect -V an auditory imagery task equivalent to Necker's cube in visual imagery -V recruits a specific working memory, the so-called articulatory or phonological loop. Is this mechanism sensitive to articulatory control constraints, i.e. phase relationships between vowel and consonant gestures? In our experiment, 56 French students repeatedly pronounced aloud non-sense syllables - all combinations of [e ] with [p] and [s] - and were asked to stop as soon as they heard a possible syllable transformation. In agreement with our in-phase predictions, the winner is syllable [pse ], where all gestures can be launched in synchrony. This experiment demonstrates that verbal working memory -V a primary candidate as input memory for word learning -V is sensitive to articulatory control of syllable phasing.

titre: Propriétés acoustiques et articulatoires des voyelles nasales du français
auteurs: Véronique DELVAUX, Thierry METENS, Alain SOQUET
mots clés: phonétique-phonologie, production, acoustique
abstract: This paper presents data about the articulatory and acoustic properties of French nasal vowels. Data show that many covarying articulations support the phonological contrast between nasal and oral vowels, in addition to the lowering of the velum. The majority of the articulatory adjustments occuring in the oral cavity lead to a lowering of F2. We relate the F2 lowering with the effects of nasal coupling, i.e. the changes in spectral balance due to the loss of energy at higher frequencies.

titre: Relations entre la perception catégorielle de la parole et l'apprentissage de la lecture
auteurs: Bogliotti Caroline, Messaoud-Galusi Souhila, Serniclaes Willy
mots clés: perception, acquisitionLangue, psycholinguistique, pathologies
abstract: This study aimed at evaluating age and reading level effects on emergence and consistency of categorical perception (CP). 5 and 10 years old children were tested on their identifiying and discriminating functions on a /do/-/to/ continuum. 5 years old children had more difficulties categorizing phonemes than 10 years old. In addition, the 10 years old poor readers were less categorical than the same age good readers. This CP deficit is characterized by a weaker discrimination of stimuli belonging to separate categories, and an increased discrimination of acoustic variants of the same phoneme. We tentatively suggest that this deficit comes from in a weaker desactivation of perceptual predispositions irrelevant for discriminating words in their native language.


titre: Origine du déficit de perception catégorielle des dyslexiques
auteurs: Souhila MESSAOUD-GALUSI, René CARRE, Caroline BOGLIOTTI, Willy SERNICLAES
mots clés: perception, acquisitionLangue, psycholinguistique, pathologies
abstract: The goal of the present experiment, was to determine if the perceptual deficit of dyslexic children is speech specific or auditory general. We tested categorical perception (discrimination task) using sinewaves analogs to a continuum ranging from [ba] to [da]. First, these stimuli were told to be noises, and then they were described as the corresponding syllables. As a control of speech perception, we also tested the categorical perception of a more natural sounding [ba]-[da] continuum. We proposed these tests to 10 years old dyslexics and to normal readers same age, and also to normal reading adults as an age control. We found out that in unnatural hearing condition categorical perception of speech is less consistent in 10 year-old normal readers than in adults. Moreover, the dyslexic show a less categorical pattern of perception than control, in speech condition only.

titre: Sur l'évaluation du second formant F'2 par une technique d'estimation spectrale basée sur une modélisation du filtrage auditif
auteurs: Kaïs Ouni , Noureddine Ellouze
mots clés: analyse, perception, gammachirp, F'2, estimation spectrale
abstract: In this paper, we propose a spectral estimation technique based on a gammachirp filterbank which is designed to provide a spectrum reflecting the spectral properties of the cochlea. The characteristic shift of the spectral peak of the gammachirp is then used to estimate perceptual formant F'2 of 18 cardinal vowels used by Bladon and Fant. We compare then the standard deviation of these results with those obtained by three traditional techniques. The first one suggested by Bladon and Fant, the second one by Paliwal et al., and the third one by Hermansky. The results show that the gammachirp spectral estimation gives a better estimate of F'2 than the second and the third techniques. It is a little less accurate than the first one.

titre: Synthèse vocale par sélection d'unité : une méthode pour la redéfinition de la courbe intonative
auteurs: Baris Bozkurt, Thierry Dutoit, Vincent Pagel
mots clés: synthèse, prosodie
abstract: In this work, we propose a new algorithm for defining intonation curves from selected units in a non-uniform units-based text-to-speech synthesis system. Since the main trend in a non-uniform units-based system is to select the best and modify the least to achieve highly natural synthetic speech, the target intonation imposed on units is of great importance. We propose a 'shift-only' algorithm to re-define target intonation from selected units, which does not modify the general prosodic characteristics (micro-prosody, melodic movements) of units, while efficiently reducing F0 discontinuities at concatenation points. For the operation, a cost function is defined as a summation of discontinuities and shifts scaled by durations of the units. Minimizing this function for the shift variable, we optimize minimum shift and minimum discontinuity constraints.

titre: Analyse syntaxique du français. Pondération par trigrammes lissés et classes d'ambiguïté lexicales
auteurs: R. Beaufort, T. Dutoit, V. Pagel
mots clés:
abstract: In a Text-to-Speech framework, we have implemented a n-gram-based Part-of-Speech tagger, currently evaluated on French. Usually, such systems reduce the probability of a sentence to that of its syntactic tags, without taking the words into account, the probability of which is hard to correctly estimate from the data. Our system reintroduces the words in the probability, replacing each word by the ambiguity class it belongs to. We have tested different kinds of smoothing by interpolation, and the influence of the classes on the results of our Part-of-Speech tagger.


titre: Implémentation d'un système de tatouage pour la transmission de données
auteurs: Alejandro LoboGuerrero, Joël Liénard, Patrick Bas
mots clés: acoustique, transmission de données
abstract: Audio watermarking is a method that allows the insertion of an imperceptible mark on an audio data set. Although the watermarking is often used to guarantee copyrights, it can also be used to increase the information transmitted in a communication context. In this paper, this idea is derived from a classical data transmission technique. Then, this model has been modified by controlling the transmitted power and by adapting the spectral coefficients of embedded codes according to the voice signal. This watermarking technique allows us to provide robust system to several treatment, specially to MP3 compression technique

titre: Nouveau système hybride GMM-SVM pour la vérification du locuteur
auteurs: Jamal Kharroubi, Gérard Chollet
mots clés: Vérification Automatique du locuteur, machines à support de vecteurs (SVM), Modèles de Mélange de Gaussiennes (GMM)
abstract: Support Vector Machines (SVM) are a new and very promising technique in statistical learning theory, proposed by V.Vapnik in 1995. In this article we address the issue of using the SVM technique for Text-independent Speaker verification experiments by proposing a new feature representation based on GMM to construct the input vector of the SVM. The results obtained are compared to the classical Log-Likelihood Ratio (LLR) technique on NIST2001 database, a part of the SWITCHBOARD database.

titre: Séparation en locuteurs de conversations via IP
auteurs: Daniel Moraru, Laurent Besacier
mots clés: reconnaissance du locuteur, recoloc, langue
abstract: In this paper we are interested in speaker segregation, meaning to recognize who speaks, and at which time, on an audio document containing the speech from several people. At first the theoretical ideas concerning our subject are presented. The signals which will have to be speaker segregated contain two-speaker conversations over IP. No statistical speaker or speech model is available a priori. The algorithm used is based on the Bayesian Information Criterion (BIC). This paper mainly brings contribution to evaluation procedures in this new field which is speaker segregation, especially when the speakers speak in the same time. For performance comparison, the VoIP database on which experiments are done is made available by the authors.

titre: Les variations rythmiques dans les dialectes arabes
auteurs: Rym HAMDI
mots clés: production, prosodie, analyse, acoustique, rythme
abstract: . Speech rhythm in the different Arabic dialects investigated has been consistently described as stress-timed. At the same time, there is preliminary evidence from perceptual experiments that listeners use speech rhythm cues to distinguish speakers from North Africa from those of the Middle East. In an attempt to elucidate the apparent contradiction, an acoustic investigation of the proportion of vocalic intervals and the standard deviation of consonantal intervals in six dialects (Morocco, Algeria, Tunisia, Egypt, Syria and Jordan) was carried out using procedures put forth by Ramus (1999). The results show that complex syllable and reduced vowels in the Western dialects, and longer vowels in the Eastern dialects seem to be the main factors responsible for differences in rhythmic structures. The paper also raises questions about the discrete or continuous natures of rhythm types.

titre: Ralentisseur du signal de parole par autocorrélation
auteurs: Philippe Martin
mots clés: analyse, acoustique, synthèse, Traitement du signal, analyse/synthèse
abstract: Speech rate changes - slow down or acceleration - have known for a long time important applications in language teaching, linguistic corpus transcription, office dictation, etc. Very good quality modifications are obtained by the phase vocoder, but at the expense of a somewhat high computing cost. Temporal methods such as PSOLA are more efficient, but highly dependent on a good pitch tracking algorithm. A new approach is presented here, similar to PSOLA as working directly on the waveform, but relying to autocorrelation to align consecutive speech segments in the overlapping adding process. It is therefore simpler to implement and more reliable as bypassing the period marking process used in PSOLA.

titre: Dissociation de la protrusion et de l'arrondissement dans la production des consonnes labialisées de l'anglais
auteurs: TODA, Martine, MAEDA, Shinji, CARLEN, Andreas J., MEFTAHI, Lyes
mots clés: production, acoustique, applications, arrondissement, protrusion, consonnes, anglais
abstract: No formal distinction is usually made between lip rounding and protrusion in articulatory description of English phonemes. Our study shows that in spite of the poor contribution of lips in phonological contrast, there is two lip rounding/protrusion patterns. These findings can be related to acoustical mechanisms of the labialised consonants. Labial approximant /w/ has a low F2 (Helmholtz resonance) that requires both strong rounding and protrusion, such as found in our data, while palatoalveolar fricatives (quarter-wave length resonance) shows a relatively wide lip aperture but an important protrusion that could contribute to lower their global spectrum to accurate the contrast with other sibilants.

titre: Un Algorithme de Réduction de la Réverbération de Signaux Issus du Vocoder de Phase
auteurs: Joseph Di Martino, Yves Laprie
mots clés: analyse, synthèse
abstract: Time-Scale modifications of speech signals, based on frequency-domain techniques are hampered by an important artifact called phasiness. This artifact corresponds to the destruction of the shape of the original signal, i.e. the de-synchronisation between the phases of frequency components. This paper describes an algorithm that preserves the shape invariance of speech signals in the context of the phase vocoder. At ICASSP'2001 we presented a first version of this work where phases were corrected at the onsets of the voiced portions of the speech signals. In this study, we extended the previous work by allowing the algorithm to synchronize and correct the phases at regular intervals of the voiced segments of speech signals. Due to our algorithm, modified signals, even for large expansion factors, are of high quality and almost exempt of phasiness. A demonstration is proposed at the web page: www.loria.fr/~jdm/PhaseVocoder/index.html where several audio files can be down-loaded.

titre: Le statut du schwa en berbère chleuh
auteurs: Rachid Ridouane
mots clés: phonétique-phonologie
abstract: This article deals with Chleuh Berber spoken in the southern part of Morocco. In this dialect, words may consist entirely of consonants without vowels and sometimes of only voiceless obstruents. In this study we have carried out acoustic and fiberscopic analyses to answer the following question : is schwa a segment at the level of phonetic representations in Chleuh ? Fiberscopic films were made of one male native speaker producing a list of forms consisting entirely of voiceless obstruents. The same list was produced by 7 male native speakers of Chleuh for the needs of the acoustic analysis. This study shows the absence of schwa vowels in forms consisting of voiceless obstruents.

titre: Amélioration de la précision de la resynthèse avec TD-PSOLA
auteurs: Vincent Colotte, Yves Laprie
mots clés: analyse, synthèse, TD-PSOLA, fondamental, traitement du signal
abstract: The paper describes techniques to improve the precision of prosodic modifications with TD-PSOLA. TD-PSOLA relies on the decomposition of the signal into overlapping frames synchronised with pitch period. The main objective is thus to preserve the consistency of marks between neighbouring frames with respect to the temporal structure of pitch periods. First, we improve pitch marking by eliminating mismatch errors which appear during rapid formant transitions. This is achieved by pruning pitch mark candidates. From the synthesis point of view we exploit a fast re-sampling method which allows signal frames to be shifted finely. Together with the pitch marking improvement, this fast re-sampling method enables very high quality transformations characterised by the absence of noise between harmonics.

titre: Segmentation du bruit d'explosion des occlusives
auteurs: Yves Laprie, Anne Bonneau
mots clés: phonétique-phonologie, analyse
abstract: This paper investigates burst segmentation for the evaluation of acoustic cues used to identify unvoiced French stops. Unlike other works which utilize a fixed length window, our approach consists in segmenting bursts into transient and frication noise. The transient is found by minimizing the sum of spectral variances of transient and frication noise over the burst. The spectral variance criterion has the advantage of being sensitive both to energy deviations and spectral variations. Additional correction procedures augment the robustness of the segmentation against the presence of spurious noises during the closure and the determination of the voicing onset with delay. The relevance of our segmentation method has been evaluated by comparing the characteristics of the main spectral peak (energy prominence versus frequency) in the transient segmented by our method with those of the full burst. Our experiments showed that bursts segmented by our method allow a better discrimination between the three places of articulation.

titre: Principes et performances du décodeur parole continue Speeral
auteurs: Pascal Nocera, Georges Linares, Dominique Massonié
mots clés: reconnaissance de parole, algorithme A*
abstract: This paper presents the continuous speech recognition system Speeral developed in the LIA. Speeral uses a modified A* algorithm to find in the search graph the best path taking into account acoustic and linguistic constraints. Rather than words by words, the A* used in Speeral is based on a phoneme lattice previously generated. To avoid the backtraking problems, the system keeps for each frame the deepest nodes of the lexical tree (partially explored) starting at this frame. If a new hypothesis to explore is ended by a word and the lexicon starting where this word finishes has already been developed, then the next hypothesis will ''jump'' directly to the deepest nodes.

titre: Des formes phonétiques aux proto-formes de la langue originelle Analyse méthodologique et évaluation des limites
auteurs: Laurent Métoz, Nathalie Vallée, Isabelle Rousset, Louis-Jean Boë, Pierre Bessière
mots clés: phonétique-phonologie, langues
abstract: The aim of our study is, basically, based on the study leaded by Merritt Ruhlen on world-languages-classification (classification presented in Ruh94). Proving that the methodology used by Merritt Ruhlen as well as the plenty amount of data used for the alike Greenberg's comparative method might have been a factor of constraint wouldn't be possible without the constitution of a probabilistic estimation. The application of this on Ruhlen's data allows us to highlight the fact that the demonstration leaded by Ruhlen among this book is probabilistically invalidated. We show that a drawing lots of the phonetic forms of the lexico-semantic referents gives the same results than him.

titre: Vers une organisation syllabique des lexiques : tendances, dépendances et cooccurrences segmentales
auteurs: I. Rousset, N. Vallée
mots clés: phonétique-phonologie, langues
abstract: This paper deals with the organisation of the syllable in natural languages. As a first attempt to shed light on selection and restriction constraints in syllable structure, we present our results based on 14 languages. First, we present some implicational laws connected to the frequency of different syllable types and based on the complexity of onset and coda. Then we are interested in the relations between segments that appear in the same syllable, or in onset of two consecutive syllables. If we hypothesize that the most frequent syllable types are the most functional ones, searching for a syllabic "architecture" based on C-V cooccurrences could reveal the syllable-structure frames.


titre: Caractérisation statistique de la nature et des contexes d'apparition des mots Hors-Vocabulaire dans la parole spontanée
auteurs: Hichem HAMIMED, Géraldine DAMNATI
mots clés: reco, reconnaissance de parole, modèles Langage, évaluation Corpus, analyse, phonétique-phonologie
abstract: To improve our knowledge on the acoustic and linguistic characteristics of the Out Of Vocabulary (OOV) words, we present in this article the results of a statistical study on the nature and the contexts of occurrence of OOV words in spontaneous speech. We examined the phonetic and syllabic structure of the OOV words and the other phenomena (false starts, badly pronounced words, truncated words). We also examined the type of utterances containing OOV words, their occurrence rates and their localization in the utterances. We studied the interest of determining different categories for OOV words in the language model. In what follows, we will describe all these analysis and comment the observations that we made.

titre: Nasalité en français spontané : Mesures aérodynamiques et fibroscopiques, études préliminaires
auteurs: Amelot Angélique, Basset Patricia, Crevier-Buchman Lise, Roubeau Bernard
mots clés: phonétique-phonologie, production
abstract: The purpose of this study is twofold: (1) to setup a methodology for gathering aerodynamic and fiberscopic data in spontaneous speech; (2) to present preliminary results: (i) there is a tendency for nasal flow to start during the phoneme preceding the nasal and a strong propensity to spread after the phoneme following the nasal; (ii) differences between speakers concerning velar lowering. We found a few cases of complete denasalization of nasals and some cases of nasalization of orals.

titre: Ouverture de la glotte, Fo, intensité et simulations émotionnelles : le cas de la joie, la colère, la surprise, la tristesse et la neutralité.
auteurs: Cédric Gendrot
mots clés: prosodie, phonétique-phonologie, émotions, électro-glottographie, physiologie
abstract: Voice open quotients have been measured and then compared to Fo and intensity in simulated emotional speech. We used electro-glottography on 3 male actors producing happiness, anger, sadness, surprise and neutrality from a specific corpus. Stimuli that were correctly identified in a perception test were selected and analysed, we found that : Measurements of open quotient could significantly distinguish between 4 classes: happiness, anger, sadness and neutrality (F=217.116 and p<0.0001 for the emotion factor). Open quotient significantly increased with intensity and Fo for anger, happiness and neutrality.

titre: Le projet MTM - Reconnaissance de la parole et du locuteur sur une plateforme embarquée
auteurs: Loic Lefort, Teva Merlin, Jean-Francois Bonastre, Pascal Nocera
mots clés: reco, recoloc-langue, applications, reconnaissance de parole, reconnaissance du locuteur
abstract: This paper presents integration of speech technologies into an embedded platform. This work is part of the MTM project, funded by the European Community, which consists in designing a new Personal Digital Assistant offering UMTS connectivity and extended multimedia capabilities. Among the project goals is the ability for the applications to feature speech recognition and speaker recognition as part of the human interface. Speech and speaker recognition systems have been developed, capable of functioning in both local (on the PDA) and remote (client/server) modes. Software interfaces have been developed to offer access to these technologies for easy integration into the PDA applications.

titre: La synchronisation des profils temporel et mélodique en français spontané
auteurs: François Poiré, Henrietta J. Cedergren
mots clés: prosodie, production, phonétique-phonologie, intonation, durée, domaine
abstract: This study is concerned with the timing of two properties of the Intonational Phrase (IP) in spontaneous speech: its durational profile (DP) and its tune. High or low continuation or finality, or high-low continuation are distinguished in a corpus of 2395 IPs from 16 speakers of Montreal French. DP is derived from normalized variation of syllable duration provided by a Z-score analysis. DP is characterized by the passage from negative normalized values to positive ones in the last two syllables of the IP. Results show that most of the time the tune doesn’t influence the evolution of DP. Only two types of IP with high-low continuation show a different behaviour: when a discourse marker is introduced at the end of the IP and when hesitation occurs at the same position. In both cases, the DP will be aligned with the penult of the IP.

titre: Evaluation de modèles d'extraction d'informations visuelles pour la reconnaissance automatique de parole audiovisuelle
auteurs: Philippe Daubias, Paul Deléglise
mots clés: reco, évaluationCorpus, reconnaissance de parole, parole audiovisuelle
abstract: In this article, we make a progress report of our research towards lipreading in close to “natural” conditions. More precisely, we describe first audio-visual speech recognition experiments carried using visual parameters extracted from “natural” images. Unlike many other experiments in the AV ASR field, these visual parameters are obtained without any hand-labelling phase and are naturally noisy, due to the extraction process. We evaluate our models through different ways of using them. These strategies include the use of shape model combined with the appearance model and the use of appearance model followed by the use of shape model. For the audio and visual parameters fusion, we used a basic DI architecture with a fixed weight and afterwards with an adaptative weighting scheme based on an energy criterion.

titre: Transformations a priori et a posteriori pour l'adaptation au locuteur
auteurs: Olivier Bellot, Driss Matrouf, Pascal Nocera
mots clés: acoustique, reconnaissance de la parole, adaptation locuteur
abstract: The speaker-dependent HMM-based recognizers gives lower word error rates in comparison with the corresponding speaker-independent recognizers. The aim of speaker adaptation techniques is to enhance the speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. In this paper, we propose a method using test and training data for acoustic model adaptation. This method operates in two steps. The first one performs an a priori adaptation using the transcribed training data of the closest training speakers to the test speaker. This adaptation is done with MAP procedure allowing reduced variances in the acoustic models. The second one performs an a posteriori adaptation using the MLLR procedure on the test data, allowing mapping of Gaussians means to match the test speaker’s acoustic space. This adaptation strategy was evaluated in a large vocabulary speech recognition task. Our method leads to a relative gain of 15% with respect to the baseline system and 10% with respect to the conventionalMLLR adaptation.


titre: Etude aérodynamique et acoustique des occlusives emphatiques et non-emphatiques de l'arabe marocain
auteurs: Chakir ZEROUAL
mots clés: production, phonétique-phonologie
abstract: In this paper we describe the aerodynamic and acoustic properties of Moroccan Arabic emphatic [T D q] and non-emphatic [t k b d g] stops in the vocalic context -aCa-. Our acoustic data show that the VOT of [T q] is much shorter than that of [t k], although there is no significant difference between the maximum values for intra-oral pressure (IOP) during the occlusion of [t k T q]. IOP is higher in the voiceless stops than in the voiced ones. After the release of the occlusion, the airflow is greater and the duration of decay of IOP much longer in [k t] than in [q T b d D g]. [d] and [D] have similar maximum values for IOP. Based on these aerodynamic and acoustic data, we demonstrate that [T q] are voiceless unaspirated stops, whereas [t k] are voiceless aspirated stops.

titre: L'arabe marocain possède des consonnes épiglottales et non pharyngales
auteurs: C ZEROUAL, L Crevier-Buchman
mots clés: production, phonétique-phonologie
abstract: We provide evidence arguing that Moroccan Arabic has two epiglottal, not pharyngeal, consonants. Fiberscopic and X-ray observations, obtained from one speaker, show that these consonants are produced with a narrow constriction between the top of the epiglottis and the posterior pharyngeal wall (an epiglotto-pharyngeal constriction). Our fiberscopic investigations show also that during epiglottal consonants the base of the epiglottis and the top of the arytenoids are very close together (an aryepiglottal constriction). Examination of the X-ray data and acoustic measurements reveals that during epiglottal consonants there is greater coarticulation of the anterior part and dorsum of the tongue with adjacent vowels. Based on our articulatory and acoustic data, however, we cannot deduce whether the primary articulation of the epiglottal consonants of Moroccan Arabic is epiglotto-pharyngeal or aryepiglottal.

titre: Introduction de contraintes pour l'inversion acoustico-articulatoire utilisant une table hypercubique
auteurs: Slim Ouni, Yves Laprie
mots clés: production, analyse, phonétique-phonologie
abstract: Our acoustic to articulatory inversion method exploits an original codebook representing the articulatory space by hypercubes. The articulatory space is decomposed into regions where the articulatory-to-acoustic mapping is linear. Each region is represented by a hypercube. The inversion procedure retrieves articulatory vectors corresponding to an acoustic entry from the hypercube codebook. As the dimension of the articulatory space is greater than the dimension of the acoustic space, the corresponding null space is sampled by linear programming to retrieve all the possible solutions. A dynamic procedure is used to recover the best articulatory trajectory according to a minimum articulatory rate criterion. The addition of constraints allows the inversion process to be focused on realistic inverse articulatory trajectories.

titre: Modèles multi-flux pour la reconnaissance audio-visuelle : des chiffres au grand vocabulaire.
auteurs: Guillaume Gravier, Gerasimos Potamianos, Chalapathy Neti
mots clés: reconnaissance audiovisuelle de la parole, audio-visuel, modèles multi-stream
abstract: We investigate the use of multi-stream HMMs for audiovisual speech recognition. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc) and can be seen as a product HMM. In this paper, we use such models to investigate the impact of allowing a controlled level of asynchrony. Furthermore, we investigate joint training of the product HMM parameters, compared to composing the model from separately trained audio- and visual-only HMMs. Experiments are carried out on a simple digit recognition task as well as on a more complex dictation task. Results show that in both cases, joint training outperforms independent training. We also show that asynchrony helps a lot on the digit recognition task while surprisingly, it does not yield any improvement on the dictation task.


titre: L'opposition [e]-[E] en syllabes ouvertes de fin de mot en français parisien : étude acoustique préliminaire
auteurs: Zsuzsanna Fagyal, Samira Hassa, Fallou Ngom
mots clés: phonétique-phonologie, production, sociolinguistique variationniste
abstract: This paper presents preliminary acoustic evidence for the merger of [e] and [E] in word-final open syllables in minimal pairs recorded in Labovian-type sociolinguistic interviews from three native speakers of French living in Paris. Although the [e]-[E] distinction in Île-de-France is one of the most studied vowel contrasts in French, variations were thought to affect only inflectional morphemes and function words. This study shows that the merger is well advanced in native Parisians' vernacular and formal speech styles. Possible implications for the front vowel inventory of French, and the acoustic correlates of near mergers are mentionned.

titre: Spécialisation automatique de modèles acoustiques
auteurs: LINARES Georges, GUEYE Serigne, LEFORT Loic, MICHELON Philippe, NOCERA Pascal
mots clés: acoustique, reco, reconnaissance de la parole
abstract: In this paper, we present a method for automatic generation of acoustic models from simple generic models. This method use the internal structure of noncontextual acoustic models in order to build new specialized states which are supposed to modelize specific patterns of a phoneme. The proposed technique use temporal information for state splitting. This method is compared to a maximum likelihood based approach. Our experiments show that this last criterion leads to better performance. Nevertheless, unsupervised model splitting seems to be less efficient than model specialization based on a priori knowledge.


titre: Un Modèle Prédictif de la Durée Segmentale pour la Synthèse de la Parole Arabe à Partir du Texte.
auteurs: A. Zaki, A. Rajouani, M. Najim
mots clés: synthèse, acoustique
abstract: This paper deals with a neural-network based model of segmental duration for a TTS Arabic system. Given a set of factors influencing phoneme duration, a Multi-Layer Perceptron (MLP) is used to predict phoneme duration. Different linguistic features are extracted automatically from the text and coded for networks with binary and analog input nodes. The correlation coefficient measured of the generalization test database is 0.882. This coefficient corresponds to 14.3 ms as a mean absolute prediction error of segmental duration.

titre: Génération automatique de la prosodie dans le système de synthèse vocale Kali : de la modélisation phonologique à l'implémentation des paramètres acoustiques
auteurs: Anne LACHERET, Michel MOREL
mots clés: prosodie, acoustique, phonétique-phonologie, synthèse
abstract: Kali, a French-speaking text-to-speech synthesis software package created for visually handicapped people, is the result of a collaboration between University and the private sector. The input text goes through a succession of 5 modules (preprocessing, syntactic analysis, prosodic gene-ration, phonemisation, acoustico-phonetic processing) and is then pronounced. Its best feature is intelligibility at ra-pid delivery. In this paper, prosodic processing is presented from the phonological representation of intonation to the acoustic processing.

titre: Détection de la stabilité de timbre des voyelles : vers une automatisation des tâches
auteurs: Christelle Dodane, Christian Guilleminot
mots clés: phonétique-phonologie
abstract: Differences of rhythmic patterns in french and english generate differences in the degree of articulatory tension concerning vowels and, therefore, stability differences in phonetic vowel quality. In order to study interferences between the two languages during the learning of English by French children, we have evolved a method in order to delimit the period of vowel quality stability, using the tracking of the first three formants frequencies. This method had been generalized and automatized in order to proceed a big quantity of datas. Howewer, before reach a complete automatization, it remains to solve the problem of formantic tracking, to avoid detection errors. Results of the extension of this method to the detection of diphthongs and triphthongs are in process and show promising results at this time.

titre: L'articulation labiale des voyelles nasales postérieures du français : comparaison entre locuteurs français et anglo-américains
auteurs: Julie MONTAGU
mots clés: phonétique-phonologie
abstract: This paper presents data analysis about the relationships between the lip shapes and the acoustic cues of French oral, nasal and nasalized vowels. The comparison of the productions of French and American English Speakers evidences the necessity of a labial adjustment to distinguish the French oral and nasal vowels. The required labial adjustment is specified by two distinct gestures: the rounding and the protrusion. The acoustic consequences of these gestures as cues are predicted from acoustic theory and are observed in the vowel spectra.

titre: Particularités articulatoires associées à la dyslexie développementale phonologique : une évaluation perceptive
auteurs: Muriel Lalain, Noël Nguyen, Michel Habib
mots clés: phonétique-phonologie, apprentissageLangue
abstract: This study investigates to what extent sublte articulatory aspects associated with phonological dyslexia contribute to identify the disorder. Productions of voiced and voiceless bilabial stops by ten dyslexic children and 2 groups of ten control children were evaluated by six trained phoneticians. The judges were presented with VCV sequencies in a consonant identification task and a goodness rating task. The results revealed specific infra phonemic error patterns for the dyslexic children compared to the control groups. These perceived deviations from a standard pronunciation may characterize developmental phonological dyslexia. These results provide new evidence for the articulatory hypothesis. Moreover, we examine the potential implications for diagnosis and remediation.

titre: Caractéristiques de la dynamique d'un pneumotachographe pour l'étude de la production de la parole : aspects acoustique et aérodynamique
auteurs: Alain Ghio
mots clés: production
abstract: The measure of aerodynamic parameters in the study of the articulatory mechanisms of speech production poses many problems. Some of them are not solved today. To measure oral and nasal airflow, a certain number of conditions must be met. In this aim, we designed and built a pneumotachograph with particular care to optimise its response time, linearity and acoustical response. This flow meter is based on the grid flow meter principle with a small dead volume and specific linearisation for the inhaled and exhaled airflow. A soft silicone rubber mask, pressed against the speaker's face prevents air leakage, without hindering articulatory movements. The acoustical distortions of the speech sound through the device are remedied by an adapted signal processing from its transfer function.

titre: A la recherche d'indices de frontière lexicale dans la resyllabation
auteurs: Fougeron Cécile, Bagou Odile, Stefanuto Muriel, Frauenfelder Uli
mots clés: production, acoustique, phonétique-phonologie, enchaînement, resyllabation
abstract: In this paper, we address the question of whether word boundaries can surface in "so-called" resyllabification in French. Durational and formantic properties of vowels and consonnants are compared in 3 boundary conditions: (A) enchainement (V1C#V2), (B) word initial consonant (V1#CV2), (C) syllable onset consonant (V1.CV2). Results show that the sequences with enchainement are acoustically distinct from the others. This suggests that "resyllabification " is not complete in French, and that the surface form of these sequences is marked by their underlying lexical/syllabic structure. Moreover, the data show that lexical boundaries may be differentiated by cues on the pre-consonantal vowels rather than on the initial consonant.

titre: Etude acoustique de deux variantes de [j] en français : la variante vocalique et la variante fricative
auteurs: Chafcouloff Michel
mots clés: acoustique, phonétique-phonologie
abstract: As few information concerning the production of a fricative allophone of the [j]­sound has been gathered in the french language, an acoustic study has been undertaken to investigate which segmental or suprasegmental factors have a dominant influence on this allophone in the idiolect of a native speaker from Southern France . Results show that among these factors, syllabic position is prevailing as a vocalic variant is found in the initial, intervocalic and preconsonantal position, whereas a fricative occurs in the final and postconsonantal position. However, the contextual environment (nature of the adjacent vowel(s), manner and voicing of the consonant in clusters) and emphatic stress context too, should be taken into account to explain the variability of [j] as a vocalic or a fricative variant in French.

titre: Functional modeling of the face during speech production
auteurs: Shinji MAEDA, Martine TODA, Andreas J. CARLEN, Lyes MAFTAHI
mots clés: production
abstract: We describe a functional modeling of face movements during speech. The data consist of face marker positions in 3D coordinates measured while a speaker read a corpus. An arbitrary orthogonal factor analysis followed by a principal component analysis on the data resulted in a set of five interpretable factors that explains 87 of variance. The first factor that account for the vertical jaw motion dominate open/close movement of the lips. Two principal factors describe, in out interpretation, the two intrinsic lip gestures, one specifies horizontal dimension, spread vs. round, and the other vertical dimension, open with rotation vs. close. Both the (horizontal) rounding and (vertical) opening contribute to the lip protrusion, which appear plausible from a biomechanical point of view.

titre: Appariement de locuteurs entre des documents sonores préalablement segmentés en utilisant la classification hiérarchique
auteurs: Sylvain Meignier, Jean-Francois Bonastre, Ivan Magrin-Chagnolleau
mots clés: reco, indexation, reconnaissance de parole
abstract: Speaker indexing of an audio database consists in organizing the audio data according to the speakers present in the database. It is composed of three steps : (1) segmentation by speakers of each audio document ; (2) speaker tying among the various segmented portions of the audio documents ; and (3) generation of a speaker-based index. This paper focuses on the second step, the speaker tying task. The result of this task is a classification of the segmented acoustic data by clusters ; each cluster should represent one speaker. This paper investigates on hierarchical classification approaches for speaker tying, and proposes two discriminant dissimilarity measures using the information provided by the segmentation. The experiments are conducted on a subset of the Switchboard database, a conversational telephone database, and show that the proposed method allows a satisfying speaker tying among various audio documents.

titre: La quantité vocalique en twi. Quelques considérations phonologiques et analyses acoustiques préliminaires
auteurs: ADU MANYHA Kofi, SOCK Rudolph
mots clés: phonétique-phonologie, acoustique
abstract: The present study, that deals with vowel quantity in Twi, is part of a programmatic research on various phonetic and phonological aspects of this tone language, spoken in Ghana. Acoustic durations are obtained from two speakers producing a series of twi minimal pairs, embedded in a carrier sentence. Absolute duration measures indicate the relevance of both vowel and consonant durations in distinguishing the phonological classes. Relative values further confirm the robustness of the feature in this language.

titre: La perception auditive de gestes vocaliques anticipatoires
auteurs: VAXELAIRE Béatrice, FERBACH-HECKER Véronique, SOCK Rudolph
mots clés: perception, production, cinéradiograpie, anticipation, sensori-moteur
abstract: This research, based on X-ray data, examines the relationship between anticipatory labial and lingual gestures and the auditory perception of an upcoming rounded vowel in French Vowel-Consonant-Vowel sequences (V1CV2). V1 is always vowel [a] and V2 vowel [u]; C is either [t] or [k]. The contribution of anticipatory coarticulation to the perception of the rounded element is examined on both the motor (articulatory) and acoustic levels. The robustness of the temporal extent of the perceptual effects is also evaluated under increased speaking rate. The paradigm consists in generating speech samples by representative speakers, then segments are " gated-out" and listeners are asked to judge what the truncated segments were.

titre: Adaptation spectrale par quantification vectorielle : exemple de la RAP à fréquences d'échantillonage multiples
auteurs: Richard Lamy, Laurent Besacier
mots clés: acoustique, reconnaissance de la parole
abstract: This paper presents a non linear approach for spectral adaptation based on Vector Quantization. The idea is to transform feature vectors extracted from signals of one quality to feature vectors of another quality. Our method is applied to the particular case of speech recognition at multiple sampling rates. Such a method, which can be applied to other adaptation problems, allows very acceptable correspondence between two considered feature spaces. Thus, a generic ASR system trained on 16kHz signals is able to recognize lower sampling rate signals without any adaptation of its acoustic models.

titre: L'acquisition de l'allongement vocalique en arabe marocain : productions de jeunes enfants marocains en âge préscolaire
auteurs: Mohamed Embarki
mots clés: phonétique-phonologie, apprentissageLangue
abstract: This work tackles the problem of the vocalic quantity in Moroccan Arabic (hence MA) from an acquisition perspective. The results of two studies are presented; they deal with the expected lengthening of the vowel [a] in CV syllable by Moroccan children of pre-school age. The first study deals with the ordinary production of a list of words in MA by three young children (4-6 years), the second deals with the hyperspeech production of the same list of words by the same children.

titre: Influence des caractéristiques des contours mélodiques sur la durée des mots en anglais britannique contemporain
auteurs: Caroline Bouzon
mots clés: phonétique-phonologie, acoustique
abstract: Because pitch glides greatly influence the duration of words, we test here the influence of some characteristics of the pitch glide in final position of intonation units on the duration of words in British English. These characteristics are the direction, complexity and width of the glide. We observed the influence of each individual parameter before a minor and major boundary, and then the interaction between them. The effect of the parameters is different before a minor or a major boundary, hence the need to maintain this distinction in a model of duration. The effect also varies depending on the type of interaction, the influence of some factors being increased or decreased according to the other parameter it interacts with.

titre: L'acquisition d'un marqueur socio-stylistique : l'exemple de la liaison facultative
auteurs: CHABANAL Damien, EMBARKI Mohamed
mots clés: phonétique-phonologie, apprentissageLangue
abstract: The aim of our study is to see if 9 years old children of French origin and from fifferent background are able to produce more obtional liaisons in more formal situations. The results show a real linguistic awareness among all the speakers on the one hand, and a sociolectal influence on the other hand. We present empirical data of an experiment realized with 24 children in order to understand how regular process of variation can be acquired.

titre: Comparaison de SMLLR et de SMAP pour une adaptation au locuteur en utilisant des modèles acoustiques markoviens
auteurs: Fabrice LAURI, Irina ILLINA, Dominique FOHR
mots clés: reco, reconnaissance de la parole, adaptation, modèles acoustiques, SMLLR, SMAP
abstract: In this paper, two adaptation schemes are presented : SMAP and SMLLR. Both methods update the parameters of the acoustic models of a speaker-independant system in order to improve its performances for a new speaker. We experimented SMAP and SMLLR to HMMs of the ESPERE engine in the batch mode and in the unsupervised incremental mode. The HMMs were learned on the Resource Management (RM) corpus. Results of the batch adaptation show a greatest efficiency of SMAP. For the unsupervised incremental adaptation, SMLLR is more powerful than SMAP, according to the incremental scheme we choose.

titre: Développement d'une technologie générique pour la reconnaissance de la parole independante de la tâche
auteurs: Fabrice Lefevre, Jean-Luc Gauvain, Lori Lamel
mots clés: reconnaissance de la parole
abstract: This work addresses issues in speech recognition portability via the development of generic core speech recognition technology. First, genericity of large domain reference models (designed for a task covering a large number of acoustic and linguistic events) is assessed trough their performance on various independent tasks. Then, new techniques based on a multi-source training are presented aiming at enhancing the level of genericity of the large domain models. Finally, methods for transparent adaptation of generic models to a particular task are studied.

titre: Analyse comparative de corpus oraux et écrits français : mots, lemmes et classes morpho-syntaxiques
auteurs : V. Gendner, M. Adda-Decker
mots clés : analyse de corpus, étiquetage morphosyntaxique, écrit, oral ,reconnaissance, modèle de langage,
abstract: Corpora of oral and written French have been automatically tagged with lemma and morpho-syntactic information: radio/TV broadcast transcripts and Le Monde newspaper. For both corpora of 40M words each, we measure the corpus vocabulary in terms of lexical forms and lemmas. Morpho-syntactic information has then been examined using the most common Parts Of Speech (POS): noun, verb, adjective, adverb, pronoun, conjunction, determiner and preposition. Distributions of word occurrences and of vocabulary items have been computed as a function of POS. A comparison between oral and written French is carried out. Beyond a quantitative description of oral and written corpora, this study aims at using more linguistic knowledge in speech recognition systems.

titre: Mise a jour automatique du modele de langage d'un systeme de transcription
auteurs: Alexandre Allauzen, Jean-Luc Gauvain
mots clés: modèlesLangage, reconnaissance de parole
abstract: This paper investigates the problem of automatic adaptation of the vocabulary and the language models (LM) of a broadcast news speech transcription system. We propose to make use of written Internet news sources which are available on a daily basis to model the thematic changes typical of the news domain. For each news source a speci- fic normalization is needed. The lexicon is updated daily and an up-to-date LM is estimated using only recent data. Adaptation is performed by interpolating the up-to-date LM with a standard (and fixed) LM. Each day the data collected from one of the sites is reserved as an evaluation corpus. Experiments carried during the month of January 2002 show a relative reduction in out-of-vocabulary rate of 32% and a 13% reduction in perplexity compared to using a fixed language model.


titre: Apprentissage d'un module stochastique de compréhension de la parole
auteurs: Hélène Maynard, Fabrice Lefevre
mots clés: modèlesLangage, applications, evaluationCorpus
abstract: The need for human expertise in the development of a speechunderstanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. In order to decrease the development cost, this work investigates the performance of the understanding module with two parameters: the influence of the training corpus size and the use of automatically annotated data.

titre: Détection de séquences par sélection de l'historique : application à la reconnaissance automatique de la parole
auteurs: Langlois David, Smaïli Kamel, Haton Jean-Paul
mots clés: modèlesLangage, reco, "modèles de séquences" "sélection par l'historique", reconnaissance de parole
abstract: This paper focuses on statistical language modelling for automatic speech recognition. We present a method which aims at finding linguistic units in corpus. This method, called the Selected History Principle, consists in finding strong distant relationships between words. The new units are phrases made up of basic units of our vocabulary linked by these distant relationships. We adapt the multigram principle to large vocabularies in order to introduce an optimal subset of these sequences into a bigram model. The bigram model using these sequences outperforms the basic bigram model by 21% in terms of Perplexity, and increases the recognition rate of the large vocabulary system Sirocco by 8.7%. The word error rate is decreased by 12.7%.

titre: Effets de masquage rétroactif dans la perception de la parole chez l'enfant dyslexique
auteurs: Noël Nguyen, Ludovic Jankowski, Muriel Lalain, Barbara Joly-Pottuz, Aurélie Leynaud, Mélina Mercier et Michel Habib
mots clés: perception, pathologies, phonétique-phonologie, acquisitionLangue, psycholinguistique, dyslexie, acquisition de la parole
abstract: Previous research has revealed that dyslexic children may be more sensitive to backward-masking effects in auditory perception than control children. In this study, we asked whether a CV transition masks a preceding VC transition to a greater extent in dyslexic children than in controls. The results suggest that dyslexic children are severely impaired on the discrimination of VC sequences, regardless of whether these sequences are followed or not by a CV sequence. These results provide further evidence that dyslexia is associated with a deficit in the perception of speech.

titre: Sirocco, un système ouvert de reconnaissance de la parole
auteurs: Guillaume Gravier, François Yvon, Bruno Jacob, Frédéric Bimbot
mots clés: reco, reconnaissance de la parole, modèlesLangage
abstract: The Sirocco project aims at developing and distributing, under a free license, a speech recognition software toolkit according to open source standards. We present in this paper the main objectives of the project and describe the solutions implemented. In particular, Sirocco enables the use of contextual constraints on the pronunciation variants in the first decoding pass. We present preliminary results on the use of contextual transcription rules on a read speech transcription task.

titre: Variabilité inter-langue et inter-individuelle en production et en perception : étude préliminaire en arabe dialectal et en français
auteurs: Jalal-eddin AL-TAMIMI, Marion GIRARD, Egidio MARSICO
mots clés: phonétique-phonologie, perception
abstract: This paper presents a preliminary study of intra-speaker and inter-speaker variability in speech production and perception with an inter-language investigation of acoustic vocalic space according to different systems. This work aims at providing an analytic study based on individual data that might account for individual strategies. We have studied variability in vowel production and perception between six speakers of two languages: French and Arabic. The results of the first part of our work show vocalic spaces larger for perception than for production for French speakers and for Jordanian Arabic speakers. Moreover, inter-language differences in vowel dispersion seem to emerge from these results.


titre: Charpente osseuse et conduit vocal :Variabilité et relations structurelles Premiers résultats
auteurs: Louis-Jean Boë, Denis Beautemps, Roger Lichtenberg, Jean-Louis-Heim, Fleur Letellier-Willemin, Martine Lichtenberg
mots clés: production, analyse, Anthropologie - Emergence du langage
abstract: In the field of speech research, vocal tract is primarily described in terms of soft tissue: glottis, pharyngeal wall, velum, tongue and lips, without any reference to bony structures. Only the incisors are used as fixed landmarks. Nevertheless, the cranium appears to be the basis on which are located the hyoid bone and the larynx, on which, in turn, are inserted vocal organs. U. Goldstein (1980) dissert ation clearly showed the benefit of data gathered on growth of bony structures in modeling vocal tract growth. Exploiting a radiographic database including 22 subjects (15 men and 7 women), we start to analyze the structural relationships among landmarks widely used in physical anthropology and we evaluate the range of variability related to sex and subject. This work is the first step of a larger project aimed at reconstructing vocal tract from a cranium, modern of fossilized. It is also to be considered in the framework of language emergence. A multidisciplinary research team participated in this project, including radio logists, Egyptologists, physical anthropologists and speech specialists.


titre: Des implémentations parallèles pour une application de la RAP
auteurs: Yahya Ould MOHAMED EL HADJ
mots clés: reco, reconnaissance de la parole, algorithmique parallèle, machines parallèles
abstract: We show through this work that harnessing the power of parallel machines can increase greatly the speed and storage capacity of certain recognizers. Obtained improvments can be exploited to expand the vocabulary on existing real-time tasks or increase the modelization precision where the recognition accuracy is most interesting than the real time.

titre: Entraînement de la conscience phonologique d'enfants déficients visuels: quel support temporo-phonologique?
auteurs: V. Prost, R. Espesser, C. Sabater, K. Thomas-Bartalucci, V. Rey
mots clés: pathologies, perception
abstract: Visually impaired children aged from 6 to 8 years seem present perceptive confusions. Thus, they show difficulties in distinguish phonemes place of joint and explicit manipulation. This present study suggests to demonstrate specific difficulties of these children. Then our search tests hypothesis of a remediation in speech acoustically modified with significant units (words) and not significant units (non-words). Results are in favor of the use of non-word in speech acoustically modified.


titre: L'ambissyllabicité des consonnes géminées : le cas du berbère (tachelhit)
auteurs: Naïma Louali
mots clés: phonétique-phonologie
abstract: Berber, like Arabic and Italian, exhibts geminate consonants The consonant system opposes a series of single consonants to geminates. The vocabulary and more specifically the morphology develops this contrast. Phonetic studies dealing with gemination mention length as the main parameter, distinguishing geminate consonants from single ones. Berber scholars disagree as regards representation and their phonological behaviour. The analysis of this kind of consonants varies depending on the phonological theory. In the present study, we shall examine the geminate consonants through an experimental approach (inversion and partial word repetitions), which is based on the categorization of these consonants by four Berber subjects. A list of 30 words in their French or Arabic translation was first agreed upon, with 10 words in each category. The aim of this study is to show how Berber speakers perceive these consonants. By this experimental research, we intend to present experimental data relevant to a discussion of the relation between phonological representation and phonetic data.

titre: Réseaux bayésiens dynamiques pour la reconnaissance multi-bandes de la parole
auteurs: Khalid Daoudi, Dominique Fohr, Christophe Antoine
mots clés: reconnaissance de la parole, réseaux bayésiens, multibande
abstract: This paper presents a new approach to multi-band automatic speech recognition which has the advantage to overcome many limitations of classical muti-band systems. The principle of this new approach is to build a speech model in the time-frequency domain using the formalism of Bayesian networks. Contrarily to classical multi-band modeling, this formalism leads to a probabilistic speech model which allows communications between the different sub-bands and, consequently, no recombination step is required in recognition. We develop efficient learning and decoding algorithms and present illustrative experiments on a connected digit recognition task. The experiments show that the Bayesian network's approach is very promising in the field of noisy speech recognition.

titre: Apprentissage de structures de réseaux bayésiens dynamiques pour la reconnaissance de la parole
auteurs: Murat Deviren, Khalid Daoudi
mots clés: reconnaissance de la parole, réseaux bayésiens
abstract: We present a speech modeling methodology where no a priori assumption is made on the dependencies between the observed and the hidden speech processes. Rather, dependencies are learned from data. This methodology guarantees improvement in modeling fidelity as compared to HMMs. In addition, it gives the user a control on the trade-off between modeling accuracy and model complexity. We evaluate the performance of the proposed methodology in a connected digit recognition task.

titre: La prosodie de la focalisation en français : faits perceptifs et morphogénétiques
auteurs: Brichet C, Aubergé V
mots clés: prosodie
abstract: Our purpose is to study how the focalisation function, more precisely the deixis function applied on the word domain, is implemented by the prosodic parameters, and what is the role of the syllable vs. the word contour. In this aim, some corpora of isolated sentences were recorded, with two instructions : to point the word vs. to point the syllable. Some perceptive experiments were held and then an acoustic analysis was applied. The results confirm the role of the first syllabe traditionally observed in the literature, but go in the sense of a contour globally shared on the word domain (a carried contour), without any significative influence on the carrying contour of the whole utterance, which confirms the hypothesis given as principle 4 in the ICP model of prosody.


titre: Un logiciel de codage de la parole basé sur le FS1016
auteurs: M. Djamah, M. Boudraa, B. Boudraa, M. Bouzid
mots clés: codage
abstract: This paper describes a speech coding software based on the Federal Standard FS1016 coder. The objective of this work is to have in our laboratory a basic speech coder to use it as background for our research works. The modifications and the extensions of the basic coder (to improve the quality) can be done easily by using the object-oriented programming.

titre: Mesure d'intelligibilité de segments de parole à l'envers en français
auteurs: Fanny Meunier, Tristan Cenier, Melissa Barkat, Ivan Magrin-Chagnolleau
mots clés: perception, phonétique-phonologie
abstract: We ran an experiment focusing on cognitive implication of reversed speech segments. Nine durations of reversed segments plus a non-distorted control condition have been considered (varying between 20 ms and 180 ms) in order to test the pattern of intelligibility degradation in French. We observed an overall strong negative correlation between the degree of intelligibility and the size of reversed-speech windows. These results appear to be very comparable to those obtained in English by Greenberg & Arai [Gre01], at least on the slope of intelligibility performance decrease. However, intelligibility loss in French is delayed by twenty milliseconds. Apart from confirming the cognitive ability to restore reversed speech up to a certain point, our study revealed differences that could be interpreted as ‘language specific’.

titre: Reconnaissance de la parole pour des locuteurs non natifs en présence de bruit
auteurs: Dominique Fohr, Odile Mella, Irina Illina, Fabrice Lauri, Christophe Cerisara, Christophe Antoine
mots clés: reconnaissance, robustesse
abstract: In real world applications, speech recognition is con-fronted with two main difficulties : the non native speakers and the background noise. The aim of this paper is to compare on the same noisy database differ-ent methods in order to increase the robustness of our HMM-based automatic speech recognition system. To deal with the non native speakers, we have tested two solutions: multi-models and adaptation techniques. For noisy speech, we have evaluated two types of methods: the first one (PMC and MLLR) adapts the initial mod-els, trained in clean speech, with a few noisy sentences. The second one (RATZ and MCR ) tries to remove the noise from the signal without modifying the HMM models.

titre: Développement morpho-phonologique de deux enfants en train d'acquérir le français après un implant cochléaire
auteurs: Géraldine Hilaire, Valérie Régol, Harriet Jisa
mots clés: pathologies, perception, acquisitionLangue
abstract: Two explanations have been offered to account for omissions of syllables in early language production: the "rhythmic production" [All78] [All80] [Ger91] [Ger94] [Ger96] and the "perceptual account"[Ech92] [Ech93]. The longitudinal data used for our analysis cover 26 months of post implant development. Our sample begins at 10 months post implant, when the majority of determiners are omitted in production, and ends at 36 months post implant, when the majority of determiners are produced. All common nouns were extracted from the corpus and examined for: 1) rate of omission errors; 2) the stability of the children's filler syllables; and 3) the context in which the form was produced, i.e., monosyllabic or multisyllabic word. The results of our study argue for a "rhythmic production account" of determiner omissions.

titre : Séparation de sources audio-visuelles : formalisation et expérimentation
auteurs : D. Sodoyer, L. Girin, C. Jutten, J.L. Shwartz
mots clés :
abstract: In this paper, we present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading: the objective is to extract an acoustic speech input from other acoustic signals by exploiting its coherence with the speaker’s lips movements. We consider the case of an additive stationary mixture. Firstly we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximising this probability.