titre: Apprentissage des langues
auteurs: Valérie Hazan
mots clés: langues, apprentissage
abstract: The goal of this paper is to briefly review the major impact
of speech technology in the area of second language learning and to present
recent developments in the area of phonetic acquisition. Second language learners
are ‘deaf’ to many sound distinctions that do not occur in their first language.
Key issues are whether it is possible to improve perception and production
via training and whether the use of speech technology is successful in promoting
acquisition. Results of recent studies on the role of speech enhancement and
of visual cues in increasing the effectiveness of language training will also
be presented.
titre: Experiments on cross-language
acoustic modeling
auteurs: T. Schultz, A. Waibel
mots clés:
abstract: With the distribution of speech products all over the world,
the portability to new target languages becomes a practical concern. As a
consequence our research focuses on rapid transfer of LVCSR systems to other
languages. In former studies we evaluated the performance if limited adaptation
data is available. Particularly for very time constrained tasks and minority
languages, it is even reasonable that no training data is available at all.
In this paper we examine what performance can be expected in this scenario.
All experiments are run in the framework of the GlobalPhone project which
investigates LVCSR systems in 15 languages.
titre: Optimisation d'arbres de décision
pour la conversion graphèmes-phonèmes
auteurs: H. Crépy, C. Amato-Beaujard, J.C. Marcadet, C. Waast-Richard
mots clés: synthèse, graphème-phonème
abstract: Extensive experiments on a data-driven decision-tree technique
for French grapheme-to-phoneme conversion are dedicated to studying the effects
of various treegrowing parameters as well as features and questions selection.
Generated phonetic transcriptions of unknown words are used for speech recognition
and synthesis. We report surprisingly good results, with recognition error
rates better than with rule-generated transcriptions, and only slightly worse
than with reference man-made transcriptions, and transcription phonetic error
rates measuring as low as 1.56%, thanks in part to the introduction of POS
tags into the context features.
titre: Traitement des incises en français
: capture automatique et modèle prosodique
auteurs: Philippe Boula de Mareüil, Estelle Maillebuau
mots clés: prosodie
abstract: Parentheticals in French are investigated, in order to assign
them a specific prosody in text-to-speech synthesis. On the basis of lexical-syntatic
and punctuational criteria, we show that it is possible to detect them automatically
by a regular grammar, with an f-measure of more than 92%, on a corpus of more
than 3000 newspaper sentences, containing about 200 parentheticals. A 20%
reduction of pitch range and of average pitch, as well as a 2dB reduction
of energy (respectively, a 10% reduction of pitch range) may then be applied
to non-final (respectively final) parentheticals, which reflects observations
made on a professional female speaker.
titre: Introduction de l'énergie dans un
modèle de reconnaissance automatique de la parole
auteurs: Abdellah Yousfi, Abdelouafi Meziane
mots clés: acoustique, Hidden Markov Model (HMM), Energy, Two Level
Hidden Semi Markov Centisecond Model (TLHSMCM).
abstract: A major deficiency of standard Hidden Markov Models (HMM)
is that both the spectral and the prosodic feature are uniformly processed.
To combine more efficiently the prosodic cues with the acoustic ones, a segmental
two Level Hidden Markov Model has been recently studied by suaudeau [Suaudeau
94]. In this paper, we present an adapted version of this model in wich the
segmental processing is replaced by the classical centisecond processing.
This new model is called Two Level Hidden Semi Markov Centisecond Model (TLHSMCM)).
Our approach retains the traditionnal hierarchical structure of an HMM, and
facilitate the introduction of others prosodic parameters (in particular the
energy) in the phonetic level. Experiments on a french database composed of
20 numbers show that this model reduces the recognition error rates.
titre: Compalex : un outil d'analyse dialectométrique
pour une comparaison phono-lexicale synchronique des parlers d'une zone géographique
auteurs: Ndamba Josué
mots clés: apprentissage Langue
abstract: This paper presents the software "Compalex" that processes
lexical data of two or several languages (or dialects) of a geographical area
in view to determine the degree of intelligibility that exist between they.
Softwares that exist nowadays calculate the common root percentage between
languages. Thereby results show far more historical relations between the
dialects or languages. Compalex processes both common root percentages between
languages and the sounds that these common roots share. Thereby, results give
a more reliable indication about the way speakers of these different languages
understand mutually. This software runs under Windows 95 or later version.
titre: Modélisation d'un système de reconnaissance
pour l'apprentissage automatique de stratégies de dialogue optimales
auteurs: Olivier Pietquin, Thierry Dutoit
mots clés: reco, reconnaissance de parole,Dialogue
abstract: This last decade, the field of spoken dialogue systems has
developed quickly. However, rapid design of dialogue strategies remains uneasy.
Automatic strategy learning has been investigated and the use of Reinforcement
Learning algorithms introduced by Levin and Pieraccini is now part of the
state of the art in this area. Obviously, the learned strategy's worth depends
on the definition of the optimization criterion used by the learning agent
and on the exactness of the environment model. In this paper, we propose to
introduce a model of an ASR system in the simulated environment in order to
enhance the learned strategy. To do so, we brought recognition error rates
and confidence levels produced by ASR systems in the optimization criterion.
titre: La syllabe comme unité de perception
de la parole : un état de la question
auteurs: Alain Content, Uli H. Frauenfelder
mots clés: phonétique-phonologie
abstract: One highly influential finding that suggests that syllabic
units are instrumental in speech perception is the crossover interaction between
target type and word type observed in the sequence detection task. In this
paper we review our recent studies with French speakers using the same task.
Overall the findings fail to replicate the "syllable effect" and indicate
that the observed effects are primarily due to the time course of the arrival
of phonetic information in the carrier stimuli. These data argue against an
early syllabic classification mechanism in speech perception, but other results
that we have obtained suggest an important role of syllable structure and
more specifically onsets in speech segmentation.
titre: L'effet syllabique dans les mots et
les pseudo-mots en français
auteurs: Grégory Leclercq, Alain Content, Uli H. Frauenfelder
mots clés: phonétique-phonologie
abstract: Two syllable detection experiments were conducted to compare
word and pseudoword carriers. The syllabic effect was found neither with words
nor with pseudowords. Regression analyses were run to examine the influence
of phonetic throughput on detection times. A contribution of the syllabic
structure of the carriers was only found for the CV targets whereas contributions
of the temporal localisations of the first vowel and of the pivotal consonant
were found for the CVC targets. The results support the view that for both
words and pseudowords, the pattern of results stems from the combination of
two distinct effects, and does not reflect the use of a perceptual syllabic
code.
titre: Etude comparative de vocalisations
de bébés humains et de bébés robots
auteurs: J. Serkhane, J.L. Schwartz, L.J. Boë, B. Davis, P. Bessière,
E. Mazer
mots clés: phonétique-phonologie, acoustique
abstract: In order to assess infants motor skills during speech development,
we used a statistical model of the vocal tract that integrates growth of the
effector system. This model allowed us to infer, from real vocalizations,
the likeliest explored acoustic regions , articulatory degrees of freedom
and vocal tract shapes , and to test MacNeilage and Davis cooccurrence hypothesis.
Our results will feed the building of a virtual robot, modelling speech development.
titre: Gabarits des tons vietnamiens
auteurs: Pham Thi Ngoc Yen, Eric Castelli, Nguyen Quoc Cuong
mots clés: phonétique-phonologie, prosodie
abstract: A 135 word corpus uttered by 16 different speakers was build
in order to study the shape of 6 Vitenamese tones. The wavelet method is used
to extract the pitch (F0) from a speech signal corpus General shapes are extracted
for each speaker, which will be useful for automatic recognition or for synthesis,
and comparisons between men and women show that we can consider no important
difference between them. However, we have to separate North speakers from
Centre/South speakers.
titre: Interface syntaxe-prosodie dans un
système de synthèse de la parole à partir du texte en arabe
auteurs: S. Baloul, M. Alissali, M. Baudry, P. Boula de Marüeil
mots clés: prosodie, synthèse
abstract: This paper presents a syntacticoprosodic model and its implementation
in a diphone Arabic texttospeech (TTS) system. This model, based on rewrite
rules, first calculates the syntactic markers of the input text. Then, a phrasing
operation segments it into chunks. The syntaxprosody interface then enables
the allocation of pauses and the generation of prosodic parameters: the melodic
contour depends on the sentence modality, on the word position within chunks
and on the chunk position within the sentence. The implemented modules are
curently being evaluated within a global evaluation of a multilingual TTS
system.
titre: Tu pourrais enregistrer un corpus
pour moi ?
auteurs: Alexis Michaud
mots clés: Enregistrement de corpus
abstract: The time-consuming task of archiving and disseminating data
is not a priority with most phoneticians. As a result, finding a suitable
ready-made corpus is no easy task; researchers often rely on corpora of questionable
value. Looking back at a century of speech recording, the legacy is not as
extensive-and nowhere as tidy-as the layman would think. This paper calls
for a " Corpus quality standard ". The argument (based on detailed examples)
is that small-scale programs adhering to simple standards can actually go
to build the databases we need. A quality standard would make data publication
easier (thus fostering research) and allow for a smoother transition into
the shelves of libraries, fulfilling the phonetician's key role in documenting
the languages of the world. ...
titre: Le e d'appui parisien : statut actuel
et progression
auteurs: CANDEA, Maria
mots clés: phonétique-phonologie
abstract: This paper studies the hypothesis of the progression, during
the last decade, of an oral phenomenon typical for the spontaneous french
spoken in the Great Paris area : the epithetical "e" (eg. bonjour-e, insert
in final position, with falling intonation). Our study is based on a comparison
between the characteristics of a recently acquired french corpus and the results
of two previous studies. It aims to describe the evolution of this phenomenon
in real time (1989 vs. 1997/8) as well as in apparent time (adultes 1997/8
vs. teenagers 1997). We show that the indicators studied here are clearly
in progress in real and apparent time, allowing to hypothesise that the mentioned
phenomenon is still continuing his expansion.
titre: Extraction de caractéristiques
par codage neuro-prédictif
auteurs: M. Chetouani, B. Gas, J.L. Zarader, C. Chavy
mots clés: analyse, extractions de caractéristiques, codage neuro-prédictif
abstract: In this paper, we present a predictive neural network called
Neural Predictive Coding (NPC). This model is used for non linear discriminant
features extraction (DFE) applied to phoneme recognition. We also, present
an extension of the NPC model : NPC-3. In order to evaluate the performances
of the NPC-3 model, we carried out a study of Darpa-Timit phonemes (in particular
/b/, /d/, /g/ and /p/, /t/, /q/ phonemes) recognition. Comparisons with traditional
coding methods are presented (LPC, MFCC and PLP) : they put in obviousness
an improvement of the classification.
titre: Stratégies perceptives en identification
des langues
auteurs: Ioana VASILESCU
mots clés: acquisitionLangue, Identification des langues
abstract: This paper deals with perceptual strategies in language identification.
The study of strategies employed by humans to identify foreign languages is
currently considered as a comparative approach in evaluating automatic performances.
We present a survey of the domain and suggest a methodology aiming to control
the factors responsible of the identification scores, i.e. experimental design,
corpus and listeners' linguistic background. Two experimental designs are
conducted (language discrimination vs. evaluation of the similarity) to determine
the strategies developed by 4 populations to identify Romance languages (French,
Spanish, Italian, Portuguese, Romanian). A case study highlights the main
identification strategies (vocalic complexity vs. previous exposure to the
languages).
titre: Traitement des mots mal reconnus en
compréhension de la parole
auteurs: Caroline Bousquet-Vernhettes
mots clés: reco, applications, compréhension de la parole - robustesse
abstract: The aim of this paper is to propose an extension of the stochastic
conceptual modeling to increase the robustness of the understanding process
faced with misrecognitions and unknown words. Corpus analysis shows that some
misrecognised words are more difficult to interpret than others, so we defined
a word ambiguity rate. We performed trial series on train schedule inquiry
application to evaluate the understanding rate when faced with misrecognised
words and in particular, when these words are city names.
titre: Evaluation psycholinguistique de l'effet
du vieillissement sur la production des noms propres
auteurs: EVRARD Muriel
mots clés: production, psycholinguistique, accès lexical, noms propres,
noms communs, vieillissement, tâche de fluence verbale
abstract: The impact of age on proper names and common nouns production
ability was investigated using a task of verbal fluency in 87 healthy adults
from four age groups (“young”, “middle-aged”, “fairly-old”, “very-old”). Participants
had to generate in one minute as many words as possible belonging to each
of three semantic categories: celebrities (generation of names of people),
countries (names of places) and fruits (common nouns). Word access ability,
as measured by number of successful retrievals, declined with age more for
names of people than for other words. This result supports a disproportionate
difficulty with age in retrieving the names of people and is interpretated
in reference with the cognitive model of Burke et al.
titre: Identification des consonnes du français
en syllabe isolée après laryngectomie partielle supracricoïdienne
auteurs: Lise Crevier-Buchman, Stéphane Hans, Jacqueline Vaissière,
Shinji Maeda, Daniel Brasnu
mots clés: perception, pathologies, consonnes, matrices de confusions,
laryngectomie partielle, voix de substitution
abstract: This study aimed to determine what patterns of perceptual
confusions characterise the voice of patients after supracricoïd partial laryngectomy
(SCPL) by the identification tests of French consonants. After SCPL, voice
is produced by a neoglottis located at approximately 3 cm above the removed
vocal folds, thus shortening the vocal-tract length. We first evaluated the
voicing distinction, as their vibrator is profoundly modified, and second
manner and place of articulation features as their vocal tract is shortened
by about 3 cm. Ten male patients were recorded 18 months after SCPL producing
16 French consonants in a syllabic context (CV). Consonant articulation appears
to impose certain constraints on voicing ability of SCPL patients, since voiced
consonants are predominantly perceived as voiceless consonants.
titre: Fusion de Paramètres Rythmiques et
Segmentaux Pour l'Identification Automatique des Langues
auteurs: Jean-Luc ROUAS, Jérôme FARINAS, François PELLEGRINO, Régine
ANDRE-OBRECHT
mots clés: reconnaissance de langue, prosodie, identification de la
langue
abstract: This paper deals with an approach to Automatic Language Identification
based on rhythmic modeling and vowel system modeling. Experiments are performed
on read speech for 5 European languages. They show that rhythm and stress
may be automatically extracted and are relevant in language identification:
using cross-validation, 78% of correct identification is reached with 21 seconds
utterances. The Vowel System Modeling, tested in the same conditions (cross-validation),
is efficient and results in a 70% of correct identification for the 21 seconds
utterances. Last, merging the output scores from the two models improves the
results : with only 11 seconds test excerpts, the correct identification rate
is over 80%.
titre: Tentative de formalisation algorithmique
de la démarche du phonologue Un outil d'aide à la formulation d'hypothèses
phonologiques
auteurs: Michel Jacobson
mots clés: phonétique-phonologie
abstract: We present a formal computerized model of a particular linguistic
theory, functional phonology a theory which is often criticized precisely
for its lack of formalization. This theory proposes on the one hand a general
framework for the expression of phonological phenomena and on the other a
model for a discovery procedure for phonological units. In formalizing this
theory explicitly, we have arrived at (1) a formalism for the expression of
data and hypotheses and (2) a computer program emulating the functionalist
methods of phonological analysis. In the paper, we present the principal data
structures used and the procedures which we have designed to process them.
Methodological obstacles which we have faced in implementing the model are
discussed.
titre: Identification des locuteurs par regroupement
hiérarchique ascendant et modèles d'ancrage
auteurs: Yassine Mami, Delphine Charlet
mots clés: Reconnaissance du locuteur
abstract: The process of speaker recognition is generally based on
modeling the characteristics of each speaker. An interesting method for modeling
consists in representing a new speaker, not in an absolute manner, but relatively
to a set of well trained speakers. Each speaker is represented by its location
in an optimal space of eigen or virtual voices. We hope that the relative
position of a speaker in this space of virtual speakers is invariant whatever
the conditions of sound recording and the content of sentences are. This paper
describes a representation space built by clustering speakers and how we can
locate a speaker by using anchor models. The paper also presents experimental
results and compares with GMM. We show that clustering gives an optimal space.
If we have a few amount of training data, we also show that our system gives
better performances.
titre: Contrôle de l'anticipation vocalique
d'arrondissement en Langage Parlé Complété
auteurs: Virginie Attina, Marie-Agnès Cathiard, Denis Beautemps
mots clés: production, modèlesLangage, phonétique-phonologie
abstract: "Langage Parlé Complété (LPC)" is the French manual system
- corresponding to Cued Speech - used to complement lip reading and thus to
enhance speech perception for hearing-impaired people. In an anticipatory
rounding context, a French speaker was audiovisually recorded pronouncing
and coding [i#yi] sequences with two different pause durations. The relative
timing of the hand and lip movements and of the corresponding acoustic signal
was quantified. The results showed that : (i) the manual cue follows the temporal
organization of visible speech; (ii) the manual target position is always
ahead of the corresponding lip target.
titre: Organisation spatio-temporelle main
- lèvres - son de séquences CV en langage parlé complété
auteurs: V. Attina, D. Beautemps, M.-A. Cathiard
mots clés: production, modèlesLangage, phonétique-phonologie
abstract: This study was designed to investigate the coordinations
in space and time between manual and oro-facial gestures involved in “Langage
Parlé Complété”, an efficient method of communication by hearing-impaired
people. Cued CV syllabic sequences were analysed. Results showed (i) five
distinct positions for vowels and (ii) manual anticipation with respect to
lip movements and sound, manual information being delivered at the beginning
of a CV syllable.
titre: Sexe, mensonges et F0
auteurs: Estelle Campione, Jean Véronis
mots clés: phonétique-phonologie
abstract: Many contradictory results have been published on male and
female voice characteristics, and the debate was sometimes tinged by sexist
stereotypes. In a de tailed study, Tielen [Tie92] seemed to partly conclude
the debate by showing that there is no difference in F 0 range among sexes.
We show in this paper that her conclusion was mislead by the measure she and
many other researchers use (the 90 range), which precisely erases the differences
to be observed. We show on a large multilingual corpus involving 60 different
speakers, that there are indeed strong differences in the shape of F 0 distribution
between sexes. Female voices show high values of skewness and kurtosis, characteristic
of long tails in the distribution, whereas no such tendency can be observed
for men.
titre: Évolution des structures de l'oral
en formation de formateurs de FLE
auteurs: Véronique Delplancq, Bernard Harmegnies
mots clés: phonétique-phonologie, applications, apprentissageLangue,
psycholinguistique, production
abstract: The paper is focused on the evolution of the second language
mastery in Portuguese students enrolled in a 4-year course for future teachers
of French. During the whole learning period, they have been regularly recorded,
and acoustical analyses have been performed on their utterances of the French
/i/, /y/ and /u/ vowels. The language acquisition profiles are related with
the students involvement in actual communicative activities in the target
language prior to their enrolment.
titre: Contraintes de contrôle articulatoire
intrasyllabique dans la mémoire de travail verbale
auteurs: Sato M., Schwartz J.L., Cathiard M.A., Abry C., Loevenbruck
H.
mots clés: psycholinguistique, production, perception, phonétique-phonologie,
Mémoire de travail - boucle phonologique
abstract: Verbal transformation effect -V an auditory imagery task
equivalent to Necker's cube in visual imagery -V recruits a specific working
memory, the so-called articulatory or phonological loop. Is this mechanism
sensitive to articulatory control constraints, i.e. phase relationships between
vowel and consonant gestures? In our experiment, 56 French students repeatedly
pronounced aloud non-sense syllables - all combinations of [e ] with [p] and
[s] - and were asked to stop as soon as they heard a possible syllable transformation.
In agreement with our in-phase predictions, the winner is syllable [pse ],
where all gestures can be launched in synchrony. This experiment demonstrates
that verbal working memory -V a primary candidate as input memory for word
learning -V is sensitive to articulatory control of syllable phasing.
titre: Propriétés acoustiques et articulatoires
des voyelles nasales du français
auteurs: Véronique DELVAUX, Thierry METENS, Alain SOQUET
mots clés: phonétique-phonologie, production, acoustique
abstract: This paper presents data about the articulatory and acoustic
properties of French nasal vowels. Data show that many covarying articulations
support the phonological contrast between nasal and oral vowels, in addition
to the lowering of the velum. The majority of the articulatory adjustments
occuring in the oral cavity lead to a lowering of F2. We relate the F2 lowering
with the effects of nasal coupling, i.e. the changes in spectral balance due
to the loss of energy at higher frequencies.
titre: Relations entre la perception catégorielle
de la parole et l'apprentissage de la lecture
auteurs: Bogliotti Caroline, Messaoud-Galusi Souhila, Serniclaes Willy
mots clés: perception, acquisitionLangue, psycholinguistique, pathologies
abstract: This study aimed at evaluating age and reading level effects
on emergence and consistency of categorical perception (CP). 5 and 10 years
old children were tested on their identifiying and discriminating functions
on a /do/-/to/ continuum. 5 years old children had more difficulties categorizing
phonemes than 10 years old. In addition, the 10 years old poor readers were
less categorical than the same age good readers. This CP deficit is characterized
by a weaker discrimination of stimuli belonging to separate categories, and
an increased discrimination of acoustic variants of the same phoneme. We tentatively
suggest that this deficit comes from in a weaker desactivation of perceptual
predispositions irrelevant for discriminating words in their native language.
titre: Origine du déficit de perception catégorielle
des dyslexiques
auteurs: Souhila MESSAOUD-GALUSI, René CARRE, Caroline BOGLIOTTI,
Willy SERNICLAES
mots clés: perception, acquisitionLangue, psycholinguistique, pathologies
abstract: The goal of the present experiment, was to determine if the
perceptual deficit of dyslexic children is speech specific or auditory general.
We tested categorical perception (discrimination task) using sinewaves analogs
to a continuum ranging from [ba] to [da]. First, these stimuli were told to
be noises, and then they were described as the corresponding syllables. As
a control of speech perception, we also tested the categorical perception
of a more natural sounding [ba]-[da] continuum. We proposed these tests to
10 years old dyslexics and to normal readers same age, and also to normal
reading adults as an age control. We found out that in unnatural hearing condition
categorical perception of speech is less consistent in 10 year-old normal
readers than in adults. Moreover, the dyslexic show a less categorical pattern
of perception than control, in speech condition only.
titre: Sur l'évaluation du second formant
F'2 par une technique d'estimation spectrale basée sur une modélisation du
filtrage auditif
auteurs: Kaïs Ouni , Noureddine Ellouze
mots clés: analyse, perception, gammachirp, F'2, estimation spectrale
abstract: In this paper, we propose a spectral estimation technique
based on a gammachirp filterbank which is designed to provide a spectrum reflecting
the spectral properties of the cochlea. The characteristic shift of the spectral
peak of the gammachirp is then used to estimate perceptual formant F'2 of
18 cardinal vowels used by Bladon and Fant. We compare then the standard deviation
of these results with those obtained by three traditional techniques. The
first one suggested by Bladon and Fant, the second one by Paliwal et al.,
and the third one by Hermansky. The results show that the gammachirp spectral
estimation gives a better estimate of F'2 than the second and the third techniques.
It is a little less accurate than the first one.
titre: Synthèse vocale par sélection d'unité
: une méthode pour la redéfinition de la courbe intonative
auteurs: Baris Bozkurt, Thierry Dutoit, Vincent Pagel
mots clés: synthèse, prosodie
abstract: In this work, we propose a new algorithm for defining intonation
curves from selected units in a non-uniform units-based text-to-speech synthesis
system. Since the main trend in a non-uniform units-based system is to select
the best and modify the least to achieve highly natural synthetic speech,
the target intonation imposed on units is of great importance. We propose
a 'shift-only' algorithm to re-define target intonation from selected units,
which does not modify the general prosodic characteristics (micro-prosody,
melodic movements) of units, while efficiently reducing F0 discontinuities
at concatenation points. For the operation, a cost function is defined as
a summation of discontinuities and shifts scaled by durations of the units.
Minimizing this function for the shift variable, we optimize minimum shift
and minimum discontinuity constraints.
titre: Analyse syntaxique du français. Pondération
par trigrammes lissés et classes d'ambiguïté lexicales
auteurs: R. Beaufort, T. Dutoit, V. Pagel
mots clés:
abstract: In a Text-to-Speech framework, we have implemented a n-gram-based
Part-of-Speech tagger, currently evaluated on French. Usually, such systems
reduce the probability of a sentence to that of its syntactic tags, without
taking the words into account, the probability of which is hard to correctly
estimate from the data. Our system reintroduces the words in the probability,
replacing each word by the ambiguity class it belongs to. We have tested different
kinds of smoothing by interpolation, and the influence of the classes on the
results of our Part-of-Speech tagger.
titre: Implémentation d'un système de tatouage
pour la transmission de données
auteurs: Alejandro LoboGuerrero, Joël Liénard, Patrick Bas
mots clés: acoustique, transmission de données
abstract: Audio watermarking is a method that allows the insertion
of an imperceptible mark on an audio data set. Although the watermarking is
often used to guarantee copyrights, it can also be used to increase the information
transmitted in a communication context. In this paper, this idea is derived
from a classical data transmission technique. Then, this model has been modified
by controlling the transmitted power and by adapting the spectral coefficients
of embedded codes according to the voice signal. This watermarking technique
allows us to provide robust system to several treatment, specially to MP3
compression technique
titre: Nouveau système hybride GMM-SVM pour
la vérification du locuteur
auteurs: Jamal Kharroubi, Gérard Chollet
mots clés: Vérification Automatique du locuteur, machines à support
de vecteurs (SVM), Modèles de Mélange de Gaussiennes (GMM)
abstract: Support Vector Machines (SVM) are a new and very promising
technique in statistical learning theory, proposed by V.Vapnik in 1995. In
this article we address the issue of using the SVM technique for Text-independent
Speaker verification experiments by proposing a new feature representation
based on GMM to construct the input vector of the SVM. The results obtained
are compared to the classical Log-Likelihood Ratio (LLR) technique on NIST2001
database, a part of the SWITCHBOARD database.
titre: Séparation en locuteurs de conversations
via IP
auteurs: Daniel Moraru, Laurent Besacier
mots clés: reconnaissance du locuteur, recoloc, langue
abstract: In this paper we are interested in speaker segregation, meaning
to recognize who speaks, and at which time, on an audio document containing
the speech from several people. At first the theoretical ideas concerning
our subject are presented. The signals which will have to be speaker segregated
contain two-speaker conversations over IP. No statistical speaker or speech
model is available a priori. The algorithm used is based on the Bayesian Information
Criterion (BIC). This paper mainly brings contribution to evaluation procedures
in this new field which is speaker segregation, especially when the speakers
speak in the same time. For performance comparison, the VoIP database on which
experiments are done is made available by the authors.
titre: Les variations rythmiques dans les
dialectes arabes
auteurs: Rym HAMDI
mots clés: production, prosodie, analyse, acoustique, rythme
abstract: . Speech rhythm in the different Arabic dialects investigated
has been consistently described as stress-timed. At the same time, there is
preliminary evidence from perceptual experiments that listeners use speech
rhythm cues to distinguish speakers from North Africa from those of the Middle
East. In an attempt to elucidate the apparent contradiction, an acoustic investigation
of the proportion of vocalic intervals and the standard deviation of consonantal
intervals in six dialects (Morocco, Algeria, Tunisia, Egypt, Syria and Jordan)
was carried out using procedures put forth by Ramus (1999). The results show
that complex syllable and reduced vowels in the Western dialects, and longer
vowels in the Eastern dialects seem to be the main factors responsible for
differences in rhythmic structures. The paper also raises questions about
the discrete or continuous natures of rhythm types.
titre: Ralentisseur du signal de parole par
autocorrélation
auteurs: Philippe Martin
mots clés: analyse, acoustique, synthèse, Traitement du signal, analyse/synthèse
abstract: Speech rate changes - slow down or acceleration - have known
for a long time important applications in language teaching, linguistic corpus
transcription, office dictation, etc. Very good quality modifications are
obtained by the phase vocoder, but at the expense of a somewhat high computing
cost. Temporal methods such as PSOLA are more efficient, but highly dependent
on a good pitch tracking algorithm. A new approach is presented here, similar
to PSOLA as working directly on the waveform, but relying to autocorrelation
to align consecutive speech segments in the overlapping adding process. It
is therefore simpler to implement and more reliable as bypassing the period
marking process used in PSOLA.
titre: Dissociation de la protrusion et de
l'arrondissement dans la production des consonnes labialisées de l'anglais
auteurs: TODA, Martine, MAEDA, Shinji, CARLEN, Andreas J., MEFTAHI,
Lyes
mots clés: production, acoustique, applications, arrondissement, protrusion,
consonnes, anglais
abstract: No formal distinction is usually made between lip rounding
and protrusion in articulatory description of English phonemes. Our study
shows that in spite of the poor contribution of lips in phonological contrast,
there is two lip rounding/protrusion patterns. These findings can be related
to acoustical mechanisms of the labialised consonants. Labial approximant
/w/ has a low F2 (Helmholtz resonance) that requires both strong rounding
and protrusion, such as found in our data, while palatoalveolar fricatives
(quarter-wave length resonance) shows a relatively wide lip aperture but an
important protrusion that could contribute to lower their global spectrum
to accurate the contrast with other sibilants.
titre: Un Algorithme de Réduction de la Réverbération
de Signaux Issus du Vocoder de Phase
auteurs: Joseph Di Martino, Yves Laprie
mots clés: analyse, synthèse
abstract: Time-Scale modifications of speech signals, based on frequency-domain
techniques are hampered by an important artifact called phasiness. This artifact
corresponds to the destruction of the shape of the original signal, i.e. the
de-synchronisation between the phases of frequency components. This paper
describes an algorithm that preserves the shape invariance of speech signals
in the context of the phase vocoder. At ICASSP'2001 we presented a first version
of this work where phases were corrected at the onsets of the voiced portions
of the speech signals. In this study, we extended the previous work by allowing
the algorithm to synchronize and correct the phases at regular intervals of
the voiced segments of speech signals. Due to our algorithm, modified signals,
even for large expansion factors, are of high quality and almost exempt of
phasiness. A demonstration is proposed at the web page: www.loria.fr/~jdm/PhaseVocoder/index.html
where several audio files can be down-loaded.
titre: Le statut du schwa en berbère
chleuh
auteurs: Rachid Ridouane
mots clés: phonétique-phonologie
abstract: This article deals with Chleuh Berber spoken in the southern
part of Morocco. In this dialect, words may consist entirely of consonants
without vowels and sometimes of only voiceless obstruents. In this study we
have carried out acoustic and fiberscopic analyses to answer the following
question : is schwa a segment at the level of phonetic representations in
Chleuh ? Fiberscopic films were made of one male native speaker producing
a list of forms consisting entirely of voiceless obstruents. The same list
was produced by 7 male native speakers of Chleuh for the needs of the acoustic
analysis. This study shows the absence of schwa vowels in forms consisting
of voiceless obstruents.
titre: Amélioration de la précision de la
resynthèse avec TD-PSOLA
auteurs: Vincent Colotte, Yves Laprie
mots clés: analyse, synthèse, TD-PSOLA, fondamental, traitement du
signal
abstract: The paper describes techniques to improve the precision of
prosodic modifications with TD-PSOLA. TD-PSOLA relies on the decomposition
of the signal into overlapping frames synchronised with pitch period. The
main objective is thus to preserve the consistency of marks between neighbouring
frames with respect to the temporal structure of pitch periods. First, we
improve pitch marking by eliminating mismatch errors which appear during rapid
formant transitions. This is achieved by pruning pitch mark candidates. From
the synthesis point of view we exploit a fast re-sampling method which allows
signal frames to be shifted finely. Together with the pitch marking improvement,
this fast re-sampling method enables very high quality transformations characterised
by the absence of noise between harmonics.
titre: Segmentation du bruit d'explosion
des occlusives
auteurs: Yves Laprie, Anne Bonneau
mots clés: phonétique-phonologie, analyse
abstract: This paper investigates burst segmentation for the evaluation
of acoustic cues used to identify unvoiced French stops. Unlike other works
which utilize a fixed length window, our approach consists in segmenting bursts
into transient and frication noise. The transient is found by minimizing the
sum of spectral variances of transient and frication noise over the burst.
The spectral variance criterion has the advantage of being sensitive both
to energy deviations and spectral variations. Additional correction procedures
augment the robustness of the segmentation against the presence of spurious
noises during the closure and the determination of the voicing onset with
delay. The relevance of our segmentation method has been evaluated by comparing
the characteristics of the main spectral peak (energy prominence versus frequency)
in the transient segmented by our method with those of the full burst. Our
experiments showed that bursts segmented by our method allow a better discrimination
between the three places of articulation.
titre: Principes et performances du décodeur
parole continue Speeral
auteurs: Pascal Nocera, Georges Linares, Dominique Massonié
mots clés: reconnaissance de parole, algorithme A*
abstract: This paper presents the continuous speech recognition system
Speeral developed in the LIA. Speeral uses a modified A* algorithm to find
in the search graph the best path taking into account acoustic and linguistic
constraints. Rather than words by words, the A* used in Speeral is based on
a phoneme lattice previously generated. To avoid the backtraking problems,
the system keeps for each frame the deepest nodes of the lexical tree (partially
explored) starting at this frame. If a new hypothesis to explore is ended
by a word and the lexicon starting where this word finishes has already been
developed, then the next hypothesis will ''jump'' directly to the deepest
nodes.
titre: Des formes phonétiques aux proto-formes
de la langue originelle Analyse méthodologique et évaluation des limites
auteurs: Laurent Métoz, Nathalie Vallée, Isabelle Rousset, Louis-Jean
Boë, Pierre Bessière
mots clés: phonétique-phonologie, langues
abstract: The aim of our study is, basically, based on the study leaded
by Merritt Ruhlen on world-languages-classification (classification presented
in Ruh94). Proving that the methodology used by Merritt Ruhlen as well as
the plenty amount of data used for the alike Greenberg's comparative method
might have been a factor of constraint wouldn't be possible without the constitution
of a probabilistic estimation. The application of this on Ruhlen's data allows
us to highlight the fact that the demonstration leaded by Ruhlen among this
book is probabilistically invalidated. We show that a drawing lots of the
phonetic forms of the lexico-semantic referents gives the same results than
him.
titre: Vers une organisation syllabique
des lexiques : tendances, dépendances et cooccurrences segmentales
auteurs: I. Rousset, N. Vallée
mots clés: phonétique-phonologie, langues
abstract: This paper deals with the organisation of the syllable in
natural languages. As a first attempt to shed light on selection and restriction
constraints in syllable structure, we present our results based on 14 languages.
First, we present some implicational laws connected to the frequency of different
syllable types and based on the complexity of onset and coda. Then we are
interested in the relations between segments that appear in the same syllable,
or in onset of two consecutive syllables. If we hypothesize that the most
frequent syllable types are the most functional ones, searching for a syllabic
"architecture" based on C-V cooccurrences could reveal the syllable-structure
frames.
titre: Caractérisation statistique de la
nature et des contexes d'apparition des mots Hors-Vocabulaire dans la parole
spontanée
auteurs: Hichem HAMIMED, Géraldine DAMNATI
mots clés: reco, reconnaissance de parole, modèles Langage, évaluation
Corpus, analyse, phonétique-phonologie
abstract: To improve our knowledge on the acoustic and linguistic characteristics
of the Out Of Vocabulary (OOV) words, we present in this article the results
of a statistical study on the nature and the contexts of occurrence of OOV
words in spontaneous speech. We examined the phonetic and syllabic structure
of the OOV words and the other phenomena (false starts, badly pronounced words,
truncated words). We also examined the type of utterances containing OOV words,
their occurrence rates and their localization in the utterances. We studied
the interest of determining different categories for OOV words in the language
model. In what follows, we will describe all these analysis and comment the
observations that we made.
titre: Nasalité en français spontané : Mesures
aérodynamiques et fibroscopiques, études préliminaires
auteurs: Amelot Angélique, Basset Patricia, Crevier-Buchman Lise,
Roubeau Bernard
mots clés: phonétique-phonologie, production
abstract: The purpose of this study is twofold: (1) to setup a methodology
for gathering aerodynamic and fiberscopic data in spontaneous speech; (2)
to present preliminary results: (i) there is a tendency for nasal flow to
start during the phoneme preceding the nasal and a strong propensity to spread
after the phoneme following the nasal; (ii) differences between speakers concerning
velar lowering. We found a few cases of complete denasalization of nasals
and some cases of nasalization of orals.
titre: Ouverture de la glotte, Fo, intensité
et simulations émotionnelles : le cas de la joie, la colère, la surprise,
la tristesse et la neutralité.
auteurs: Cédric Gendrot
mots clés: prosodie, phonétique-phonologie, émotions, électro-glottographie,
physiologie
abstract: Voice open quotients have been measured and then compared
to Fo and intensity in simulated emotional speech. We used electro-glottography
on 3 male actors producing happiness, anger, sadness, surprise and neutrality
from a specific corpus. Stimuli that were correctly identified in a perception
test were selected and analysed, we found that : Measurements of open quotient
could significantly distinguish between 4 classes: happiness, anger, sadness
and neutrality (F=217.116 and p<0.0001 for the emotion factor). Open quotient
significantly increased with intensity and Fo for anger, happiness and neutrality.
titre: Le projet MTM - Reconnaissance de
la parole et du locuteur sur une plateforme embarquée
auteurs: Loic Lefort, Teva Merlin, Jean-Francois Bonastre, Pascal
Nocera
mots clés: reco, recoloc-langue, applications, reconnaissance de parole,
reconnaissance du locuteur
abstract: This paper presents integration of speech technologies into
an embedded platform. This work is part of the MTM project, funded by the
European Community, which consists in designing a new Personal Digital Assistant
offering UMTS connectivity and extended multimedia capabilities. Among the
project goals is the ability for the applications to feature speech recognition
and speaker recognition as part of the human interface. Speech and speaker
recognition systems have been developed, capable of functioning in both local
(on the PDA) and remote (client/server) modes. Software interfaces have been
developed to offer access to these technologies for easy integration into
the PDA applications.
titre: La synchronisation des profils temporel
et mélodique en français spontané
auteurs: François Poiré, Henrietta J. Cedergren
mots clés: prosodie, production, phonétique-phonologie, intonation,
durée, domaine
abstract: This study is concerned with the timing of two properties
of the Intonational Phrase (IP) in spontaneous speech: its durational profile
(DP) and its tune. High or low continuation or finality, or high-low continuation
are distinguished in a corpus of 2395 IPs from 16 speakers of Montreal French.
DP is derived from normalized variation of syllable duration provided by a
Z-score analysis. DP is characterized by the passage from negative normalized
values to positive ones in the last two syllables of the IP. Results show
that most of the time the tune doesn’t influence the evolution of DP. Only
two types of IP with high-low continuation show a different behaviour: when
a discourse marker is introduced at the end of the IP and when hesitation
occurs at the same position. In both cases, the DP will be aligned with the
penult of the IP.
titre: Evaluation de modèles d'extraction
d'informations visuelles pour la reconnaissance automatique de parole audiovisuelle
auteurs: Philippe Daubias, Paul Deléglise
mots clés: reco, évaluationCorpus, reconnaissance de parole,
parole audiovisuelle
abstract: In this article, we make a progress report of our research
towards lipreading in close to “natural” conditions. More precisely, we describe
first audio-visual speech recognition experiments carried using visual parameters
extracted from “natural” images. Unlike many other experiments in the AV ASR
field, these visual parameters are obtained without any hand-labelling phase
and are naturally noisy, due to the extraction process. We evaluate our models
through different ways of using them. These strategies include the use of
shape model combined with the appearance model and the use of appearance model
followed by the use of shape model. For the audio and visual parameters fusion,
we used a basic DI architecture with a fixed weight and afterwards with an
adaptative weighting scheme based on an energy criterion.
titre: Transformations a priori et a posteriori
pour l'adaptation au locuteur
auteurs: Olivier Bellot, Driss Matrouf, Pascal Nocera
mots clés: acoustique, reconnaissance de la parole, adaptation locuteur
abstract: The speaker-dependent HMM-based recognizers gives lower word
error rates in comparison with the corresponding speaker-independent recognizers.
The aim of speaker adaptation techniques is to enhance the speaker-independent
acoustic models to bring their recognition accuracy as close as possible to
the one obtained with speaker-dependent models. In this paper, we propose
a method using test and training data for acoustic model adaptation. This
method operates in two steps. The first one performs an a priori adaptation
using the transcribed training data of the closest training speakers to the
test speaker. This adaptation is done with MAP procedure allowing reduced
variances in the acoustic models. The second one performs an a posteriori
adaptation using the MLLR procedure on the test data, allowing mapping of
Gaussians means to match the test speaker’s acoustic space. This adaptation
strategy was evaluated in a large vocabulary speech recognition task. Our
method leads to a relative gain of 15% with respect to the baseline system
and 10% with respect to the conventionalMLLR adaptation.
titre: Etude aérodynamique et acoustique
des occlusives emphatiques et non-emphatiques de l'arabe marocain
auteurs: Chakir ZEROUAL
mots clés: production, phonétique-phonologie
abstract: In this paper we describe the aerodynamic and acoustic properties
of Moroccan Arabic emphatic [T D q] and non-emphatic [t k b d g] stops in
the vocalic context -aCa-. Our acoustic data show that the VOT of [T q] is
much shorter than that of [t k], although there is no significant difference
between the maximum values for intra-oral pressure (IOP) during the occlusion
of [t k T q]. IOP is higher in the voiceless stops than in the voiced ones.
After the release of the occlusion, the airflow is greater and the duration
of decay of IOP much longer in [k t] than in [q T b d D g]. [d] and [D] have
similar maximum values for IOP. Based on these aerodynamic and acoustic data,
we demonstrate that [T q] are voiceless unaspirated stops, whereas [t k] are
voiceless aspirated stops.
titre: L'arabe marocain possède des consonnes
épiglottales et non pharyngales
auteurs: C ZEROUAL, L Crevier-Buchman
mots clés: production, phonétique-phonologie
abstract: We provide evidence arguing that Moroccan Arabic has two
epiglottal, not pharyngeal, consonants. Fiberscopic and X-ray observations,
obtained from one speaker, show that these consonants are produced with a
narrow constriction between the top of the epiglottis and the posterior pharyngeal
wall (an epiglotto-pharyngeal constriction). Our fiberscopic investigations
show also that during epiglottal consonants the base of the epiglottis and
the top of the arytenoids are very close together (an aryepiglottal constriction).
Examination of the X-ray data and acoustic measurements reveals that during
epiglottal consonants there is greater coarticulation of the anterior part
and dorsum of the tongue with adjacent vowels. Based on our articulatory and
acoustic data, however, we cannot deduce whether the primary articulation
of the epiglottal consonants of Moroccan Arabic is epiglotto-pharyngeal or
aryepiglottal.
titre: Introduction de contraintes pour l'inversion
acoustico-articulatoire utilisant une table hypercubique
auteurs: Slim Ouni, Yves Laprie
mots clés: production, analyse, phonétique-phonologie
abstract: Our acoustic to articulatory inversion method exploits an
original codebook representing the articulatory space by hypercubes. The articulatory
space is decomposed into regions where the articulatory-to-acoustic mapping
is linear. Each region is represented by a hypercube. The inversion procedure
retrieves articulatory vectors corresponding to an acoustic entry from the
hypercube codebook. As the dimension of the articulatory space is greater
than the dimension of the acoustic space, the corresponding null space is
sampled by linear programming to retrieve all the possible solutions. A dynamic
procedure is used to recover the best articulatory trajectory according to
a minimum articulatory rate criterion. The addition of constraints allows
the inversion process to be focused on realistic inverse articulatory trajectories.
titre: Modèles multi-flux pour la reconnaissance
audio-visuelle : des chiffres au grand vocabulaire.
auteurs: Guillaume Gravier, Gerasimos Potamianos, Chalapathy Neti
mots clés: reconnaissance audiovisuelle de la parole, audio-visuel,
modèles multi-stream
abstract: We investigate the use of multi-stream HMMs for audiovisual
speech recognition. Multi-stream HMMs allow the modeling of asynchrony between
the audio and visual state sequences at a variety of levels (phone, syllable,
word, etc) and can be seen as a product HMM. In this paper, we use such models
to investigate the impact of allowing a controlled level of asynchrony. Furthermore,
we investigate joint training of the product HMM parameters, compared to composing
the model from separately trained audio- and visual-only HMMs. Experiments
are carried out on a simple digit recognition task as well as on a more complex
dictation task. Results show that in both cases, joint training outperforms
independent training. We also show that asynchrony helps a lot on the digit
recognition task while surprisingly, it does not yield any improvement on
the dictation task.
titre: L'opposition [e]-[E] en syllabes ouvertes
de fin de mot en français parisien : étude acoustique préliminaire
auteurs: Zsuzsanna Fagyal, Samira Hassa, Fallou Ngom
mots clés: phonétique-phonologie, production, sociolinguistique variationniste
abstract: This paper presents preliminary acoustic evidence for the
merger of [e] and [E] in word-final open syllables in minimal pairs recorded
in Labovian-type sociolinguistic interviews from three native speakers of
French living in Paris. Although the [e]-[E] distinction in Île-de-France
is one of the most studied vowel contrasts in French, variations were thought
to affect only inflectional morphemes and function words. This study shows
that the merger is well advanced in native Parisians' vernacular and formal
speech styles. Possible implications for the front vowel inventory of French,
and the acoustic correlates of near mergers are mentionned.
titre: Spécialisation automatique de modèles
acoustiques
auteurs: LINARES Georges, GUEYE Serigne, LEFORT Loic, MICHELON Philippe,
NOCERA Pascal
mots clés: acoustique, reco, reconnaissance de la parole
abstract: In this paper, we present a method for automatic generation
of acoustic models from simple generic models. This method use the internal
structure of noncontextual acoustic models in order to build new specialized
states which are supposed to modelize specific patterns of a phoneme. The
proposed technique use temporal information for state splitting. This method
is compared to a maximum likelihood based approach. Our experiments show that
this last criterion leads to better performance. Nevertheless, unsupervised
model splitting seems to be less efficient than model specialization based
on a priori knowledge.
titre: Un Modèle Prédictif de la Durée Segmentale
pour la Synthèse de la Parole Arabe à Partir du Texte.
auteurs: A. Zaki, A. Rajouani, M. Najim
mots clés: synthèse, acoustique
abstract: This paper deals with a neural-network based model of segmental
duration for a TTS Arabic system. Given a set of factors influencing phoneme
duration, a Multi-Layer Perceptron (MLP) is used to predict phoneme duration.
Different linguistic features are extracted automatically from the text and
coded for networks with binary and analog input nodes. The correlation coefficient
measured of the generalization test database is 0.882. This coefficient corresponds
to 14.3 ms as a mean absolute prediction error of segmental duration.
titre: Génération automatique de la prosodie
dans le système de synthèse vocale Kali : de la modélisation phonologique
à l'implémentation des paramètres acoustiques
auteurs: Anne LACHERET, Michel MOREL
mots clés: prosodie, acoustique, phonétique-phonologie, synthèse
abstract: Kali, a French-speaking text-to-speech synthesis software
package created for visually handicapped people, is the result of a collaboration
between University and the private sector. The input text goes through a succession
of 5 modules (preprocessing, syntactic analysis, prosodic gene-ration, phonemisation,
acoustico-phonetic processing) and is then pronounced. Its best feature is
intelligibility at ra-pid delivery. In this paper, prosodic processing is
presented from the phonological representation of intonation to the acoustic
processing.
titre: Détection de la stabilité de timbre
des voyelles : vers une automatisation des tâches
auteurs: Christelle Dodane, Christian Guilleminot
mots clés: phonétique-phonologie
abstract: Differences of rhythmic patterns in french and english generate
differences in the degree of articulatory tension concerning vowels and, therefore,
stability differences in phonetic vowel quality. In order to study interferences
between the two languages during the learning of English by French children,
we have evolved a method in order to delimit the period of vowel quality stability,
using the tracking of the first three formants frequencies. This method had
been generalized and automatized in order to proceed a big quantity of datas.
Howewer, before reach a complete automatization, it remains to solve the problem
of formantic tracking, to avoid detection errors. Results of the extension
of this method to the detection of diphthongs and triphthongs are in process
and show promising results at this time.
titre: L'articulation labiale des voyelles
nasales postérieures du français : comparaison entre locuteurs français et
anglo-américains
auteurs: Julie MONTAGU
mots clés: phonétique-phonologie
abstract: This paper presents data analysis about the relationships
between the lip shapes and the acoustic cues of French oral, nasal and nasalized
vowels. The comparison of the productions of French and American English Speakers
evidences the necessity of a labial adjustment to distinguish the French oral
and nasal vowels. The required labial adjustment is specified by two distinct
gestures: the rounding and the protrusion. The acoustic consequences of these
gestures as cues are predicted from acoustic theory and are observed in the
vowel spectra.
titre: Particularités articulatoires associées
à la dyslexie développementale phonologique : une évaluation perceptive
auteurs: Muriel Lalain, Noël Nguyen, Michel Habib
mots clés: phonétique-phonologie, apprentissageLangue
abstract: This study investigates to what extent sublte articulatory
aspects associated with phonological dyslexia contribute to identify the disorder.
Productions of voiced and voiceless bilabial stops by ten dyslexic children
and 2 groups of ten control children were evaluated by six trained phoneticians.
The judges were presented with VCV sequencies in a consonant identification
task and a goodness rating task. The results revealed specific infra phonemic
error patterns for the dyslexic children compared to the control groups. These
perceived deviations from a standard pronunciation may characterize developmental
phonological dyslexia. These results provide new evidence for the articulatory
hypothesis. Moreover, we examine the potential implications for diagnosis
and remediation.
titre: Caractéristiques de la dynamique
d'un pneumotachographe pour l'étude de la production de la parole : aspects
acoustique et aérodynamique
auteurs: Alain Ghio
mots clés: production
abstract: The measure of aerodynamic parameters in the study of the
articulatory mechanisms of speech production poses many problems. Some of
them are not solved today. To measure oral and nasal airflow, a certain number
of conditions must be met. In this aim, we designed and built a pneumotachograph
with particular care to optimise its response time, linearity and acoustical
response. This flow meter is based on the grid flow meter principle with a
small dead volume and specific linearisation for the inhaled and exhaled airflow.
A soft silicone rubber mask, pressed against the speaker's face prevents air
leakage, without hindering articulatory movements. The acoustical distortions
of the speech sound through the device are remedied by an adapted signal processing
from its transfer function.
titre: A la recherche d'indices de frontière
lexicale dans la resyllabation
auteurs: Fougeron Cécile, Bagou Odile, Stefanuto Muriel, Frauenfelder
Uli
mots clés: production, acoustique, phonétique-phonologie, enchaînement,
resyllabation
abstract: In this paper, we address the question of whether word boundaries
can surface in "so-called" resyllabification in French. Durational and formantic
properties of vowels and consonnants are compared in 3 boundary conditions:
(A) enchainement (V1C#V2), (B) word initial consonant (V1#CV2), (C) syllable
onset consonant (V1.CV2). Results show that the sequences with enchainement
are acoustically distinct from the others. This suggests that "resyllabification
" is not complete in French, and that the surface form of these sequences
is marked by their underlying lexical/syllabic structure. Moreover, the data
show that lexical boundaries may be differentiated by cues on the pre-consonantal
vowels rather than on the initial consonant.
titre: Etude acoustique de deux variantes
de [j] en français : la variante vocalique et la variante fricative
auteurs: Chafcouloff Michel
mots clés: acoustique, phonétique-phonologie
abstract: As few information concerning the production of a fricative
allophone of the [j]sound has been gathered in the french language, an acoustic
study has been undertaken to investigate which segmental or suprasegmental
factors have a dominant influence on this allophone in the idiolect of a native
speaker from Southern France . Results show that among these factors, syllabic
position is prevailing as a vocalic variant is found in the initial, intervocalic
and preconsonantal position, whereas a fricative occurs in the final and postconsonantal
position. However, the contextual environment (nature of the adjacent vowel(s),
manner and voicing of the consonant in clusters) and emphatic stress context
too, should be taken into account to explain the variability of [j] as a vocalic
or a fricative variant in French.
titre: Functional modeling of the face during
speech production
auteurs: Shinji MAEDA, Martine TODA, Andreas J. CARLEN, Lyes MAFTAHI
mots clés: production
abstract: We describe a functional modeling of face movements during
speech. The data consist of face marker positions in 3D coordinates measured
while a speaker read a corpus. An arbitrary orthogonal factor analysis followed
by a principal component analysis on the data resulted in a set of five interpretable
factors that explains 87 of variance. The first factor that account for the
vertical jaw motion dominate open/close movement of the lips. Two principal
factors describe, in out interpretation, the two intrinsic lip gestures, one
specifies horizontal dimension, spread vs. round, and the other vertical dimension,
open with rotation vs. close. Both the (horizontal) rounding and (vertical)
opening contribute to the lip protrusion, which appear plausible from a biomechanical
point of view.
titre: Appariement de locuteurs entre des
documents sonores préalablement segmentés en utilisant la classification hiérarchique
auteurs: Sylvain Meignier, Jean-Francois Bonastre, Ivan Magrin-Chagnolleau
mots clés: reco, indexation, reconnaissance de parole
abstract: Speaker indexing of an audio database consists in organizing
the audio data according to the speakers present in the database. It is composed
of three steps : (1) segmentation by speakers of each audio document ; (2)
speaker tying among the various segmented portions of the audio documents
; and (3) generation of a speaker-based index. This paper focuses on the second
step, the speaker tying task. The result of this task is a classification
of the segmented acoustic data by clusters ; each cluster should represent
one speaker. This paper investigates on hierarchical classification approaches
for speaker tying, and proposes two discriminant dissimilarity measures using
the information provided by the segmentation. The experiments are conducted
on a subset of the Switchboard database, a conversational telephone database,
and show that the proposed method allows a satisfying speaker tying among
various audio documents.
titre: La quantité vocalique en twi. Quelques
considérations phonologiques et analyses acoustiques préliminaires
auteurs: ADU MANYHA Kofi, SOCK Rudolph
mots clés: phonétique-phonologie, acoustique
abstract: The present study, that deals with vowel quantity in Twi,
is part of a programmatic research on various phonetic and phonological aspects
of this tone language, spoken in Ghana. Acoustic durations are obtained from
two speakers producing a series of twi minimal pairs, embedded in a carrier
sentence. Absolute duration measures indicate the relevance of both vowel
and consonant durations in distinguishing the phonological classes. Relative
values further confirm the robustness of the feature in this language.
titre: La perception auditive de gestes
vocaliques anticipatoires
auteurs: VAXELAIRE Béatrice, FERBACH-HECKER Véronique, SOCK Rudolph
mots clés: perception, production, cinéradiograpie, anticipation, sensori-moteur
abstract: This research, based on X-ray data, examines the relationship
between anticipatory labial and lingual gestures and the auditory perception
of an upcoming rounded vowel in French Vowel-Consonant-Vowel sequences (V1CV2).
V1 is always vowel [a] and V2 vowel [u]; C is either [t] or [k]. The contribution
of anticipatory coarticulation to the perception of the rounded element is
examined on both the motor (articulatory) and acoustic levels. The robustness
of the temporal extent of the perceptual effects is also evaluated under increased
speaking rate. The paradigm consists in generating speech samples by representative
speakers, then segments are " gated-out" and listeners are asked to judge
what the truncated segments were.
titre: Adaptation spectrale par quantification
vectorielle : exemple de la RAP à fréquences d'échantillonage multiples
auteurs: Richard Lamy, Laurent Besacier
mots clés: acoustique, reconnaissance de la parole
abstract: This paper presents a non linear approach for spectral adaptation
based on Vector Quantization. The idea is to transform feature vectors extracted
from signals of one quality to feature vectors of another quality. Our method
is applied to the particular case of speech recognition at multiple sampling
rates. Such a method, which can be applied to other adaptation problems, allows
very acceptable correspondence between two considered feature spaces. Thus,
a generic ASR system trained on 16kHz signals is able to recognize lower sampling
rate signals without any adaptation of its acoustic models.
titre: L'acquisition de l'allongement vocalique
en arabe marocain : productions de jeunes enfants marocains en âge préscolaire
auteurs: Mohamed Embarki
mots clés: phonétique-phonologie, apprentissageLangue
abstract: This work tackles the problem of the vocalic quantity in
Moroccan Arabic (hence MA) from an acquisition perspective. The results of
two studies are presented; they deal with the expected lengthening of the
vowel [a] in CV syllable by Moroccan children of pre-school age. The first
study deals with the ordinary production of a list of words in MA by three
young children (4-6 years), the second deals with the hyperspeech production
of the same list of words by the same children.
titre: Influence des caractéristiques des
contours mélodiques sur la durée des mots en anglais britannique contemporain
auteurs: Caroline Bouzon
mots clés: phonétique-phonologie, acoustique
abstract: Because pitch glides greatly influence the duration of words,
we test here the influence of some characteristics of the pitch glide in final
position of intonation units on the duration of words in British English.
These characteristics are the direction, complexity and width of the glide.
We observed the influence of each individual parameter before a minor and
major boundary, and then the interaction between them. The effect of the parameters
is different before a minor or a major boundary, hence the need to maintain
this distinction in a model of duration. The effect also varies depending
on the type of interaction, the influence of some factors being increased
or decreased according to the other parameter it interacts with.
titre: L'acquisition d'un marqueur socio-stylistique
: l'exemple de la liaison facultative
auteurs: CHABANAL Damien, EMBARKI Mohamed
mots clés: phonétique-phonologie, apprentissageLangue
abstract: The aim of our study is to see if 9 years old children of
French origin and from fifferent background are able to produce more obtional
liaisons in more formal situations. The results show a real linguistic awareness
among all the speakers on the one hand, and a sociolectal influence on the
other hand. We present empirical data of an experiment realized with 24 children
in order to understand how regular process of variation can be acquired.
titre: Comparaison de SMLLR et de SMAP pour
une adaptation au locuteur en utilisant des modèles acoustiques markoviens
auteurs: Fabrice LAURI, Irina ILLINA, Dominique FOHR
mots clés: reco, reconnaissance de la parole, adaptation, modèles acoustiques,
SMLLR, SMAP
abstract: In this paper, two adaptation schemes are presented : SMAP
and SMLLR. Both methods update the parameters of the acoustic models of a
speaker-independant system in order to improve its performances for a new
speaker. We experimented SMAP and SMLLR to HMMs of the ESPERE engine in the
batch mode and in the unsupervised incremental mode. The HMMs were learned
on the Resource Management (RM) corpus. Results of the batch adaptation show
a greatest efficiency of SMAP. For the unsupervised incremental adaptation,
SMLLR is more powerful than SMAP, according to the incremental scheme we choose.
titre: Développement d'une technologie
générique pour la reconnaissance de la parole independante de
la tâche
auteurs: Fabrice Lefevre, Jean-Luc Gauvain, Lori Lamel
mots clés: reconnaissance de la parole
abstract: This work addresses issues in speech recognition portability
via the development of generic core speech recognition technology. First,
genericity of large domain reference models (designed for a task covering
a large number of acoustic and linguistic events) is assessed trough their
performance on various independent tasks. Then, new techniques based on a
multi-source training are presented aiming at enhancing the level of genericity
of the large domain models. Finally, methods for transparent adaptation of
generic models to a particular task are studied.
titre: Analyse comparative de corpus oraux
et écrits français : mots, lemmes et classes morpho-syntaxiques
auteurs : V. Gendner, M. Adda-Decker
mots clés : analyse de corpus, étiquetage morphosyntaxique, écrit,
oral ,reconnaissance, modèle de langage,
abstract: Corpora of oral and written French have been automatically
tagged with lemma and morpho-syntactic information: radio/TV broadcast transcripts
and Le Monde newspaper. For both corpora of 40M words each, we measure
the corpus vocabulary in terms of lexical forms and lemmas. Morpho-syntactic
information has then been examined using the most common Parts Of Speech (POS):
noun, verb, adjective, adverb, pronoun, conjunction, determiner and preposition.
Distributions of word occurrences and of vocabulary items have been computed
as a function of POS. A comparison between oral and written French is carried
out. Beyond a quantitative description of oral and written corpora, this study
aims at using more linguistic knowledge in speech recognition systems.
titre: Mise a jour automatique du modele
de langage d'un systeme de transcription
auteurs: Alexandre Allauzen, Jean-Luc Gauvain
mots clés: modèlesLangage, reconnaissance de parole
abstract: This paper investigates the problem of automatic adaptation
of the vocabulary and the language models (LM) of a broadcast news speech
transcription system. We propose to make use of written Internet news sources
which are available on a daily basis to model the thematic changes typical
of the news domain. For each news source a speci- fic normalization is needed.
The lexicon is updated daily and an up-to-date LM is estimated using only
recent data. Adaptation is performed by interpolating the up-to-date LM with
a standard (and fixed) LM. Each day the data collected from one of the sites
is reserved as an evaluation corpus. Experiments carried during the month
of January 2002 show a relative reduction in out-of-vocabulary rate of 32%
and a 13% reduction in perplexity compared to using a fixed language model.
titre: Apprentissage d'un module stochastique
de compréhension de la parole
auteurs: Hélène Maynard, Fabrice Lefevre
mots clés: modèlesLangage, applications, evaluationCorpus
abstract: The need for human expertise in the development of a speechunderstanding
system can be greatly reduced by the use of stochastic techniques. However
corpus-based techniques require the annotation of large amounts of training
data. Manual semantic annotation of such corpora is tedious, expensive, and
subject to inconsistencies. In order to decrease the development cost, this
work investigates the performance of the understanding module with two parameters:
the influence of the training corpus size and the use of automatically annotated
data.
titre: Détection de séquences par sélection
de l'historique : application à la reconnaissance automatique de la parole
auteurs: Langlois David, Smaïli Kamel, Haton Jean-Paul
mots clés: modèlesLangage, reco, "modèles de séquences" "sélection
par l'historique", reconnaissance de parole
abstract: This paper focuses on statistical language modelling for
automatic speech recognition. We present a method which aims at finding linguistic
units in corpus. This method, called the Selected History Principle, consists
in finding strong distant relationships between words. The new units are phrases
made up of basic units of our vocabulary linked by these distant relationships.
We adapt the multigram principle to large vocabularies in order to introduce
an optimal subset of these sequences into a bigram model. The bigram model
using these sequences outperforms the basic bigram model by 21% in terms of
Perplexity, and increases the recognition rate of the large vocabulary system
Sirocco by 8.7%. The word error rate is decreased by 12.7%.
titre: Effets de masquage rétroactif dans
la perception de la parole chez l'enfant dyslexique
auteurs: Noël Nguyen, Ludovic Jankowski, Muriel Lalain, Barbara Joly-Pottuz,
Aurélie Leynaud, Mélina Mercier et Michel Habib
mots clés: perception, pathologies, phonétique-phonologie, acquisitionLangue,
psycholinguistique, dyslexie, acquisition de la parole
abstract: Previous research has revealed that dyslexic children may
be more sensitive to backward-masking effects in auditory perception than
control children. In this study, we asked whether a CV transition masks a
preceding VC transition to a greater extent in dyslexic children than in controls.
The results suggest that dyslexic children are severely impaired on the discrimination
of VC sequences, regardless of whether these sequences are followed or not
by a CV sequence. These results provide further evidence that dyslexia is
associated with a deficit in the perception of speech.
titre: Sirocco, un système ouvert de reconnaissance
de la parole
auteurs: Guillaume Gravier, François Yvon, Bruno Jacob, Frédéric Bimbot
mots clés: reco, reconnaissance de la parole, modèlesLangage
abstract: The Sirocco project aims at developing and distributing,
under a free license, a speech recognition software toolkit according to open
source standards. We present in this paper the main objectives of the project
and describe the solutions implemented. In particular, Sirocco enables the
use of contextual constraints on the pronunciation variants in the first decoding
pass. We present preliminary results on the use of contextual transcription
rules on a read speech transcription task.
titre: Variabilité inter-langue et inter-individuelle
en production et en perception : étude préliminaire en arabe dialectal et
en français
auteurs: Jalal-eddin AL-TAMIMI, Marion GIRARD, Egidio MARSICO
mots clés: phonétique-phonologie, perception
abstract: This paper presents a preliminary study of intra-speaker
and inter-speaker variability in speech production and perception with an
inter-language investigation of acoustic vocalic space according to different
systems. This work aims at providing an analytic study based on individual
data that might account for individual strategies. We have studied variability
in vowel production and perception between six speakers of two languages:
French and Arabic. The results of the first part of our work show vocalic
spaces larger for perception than for production for French speakers and for
Jordanian Arabic speakers. Moreover, inter-language differences in vowel dispersion
seem to emerge from these results.
titre: Charpente osseuse et conduit vocal
:Variabilité et relations structurelles Premiers résultats
auteurs: Louis-Jean Boë, Denis Beautemps, Roger Lichtenberg, Jean-Louis-Heim,
Fleur Letellier-Willemin, Martine Lichtenberg
mots clés: production, analyse, Anthropologie - Emergence du langage
abstract: In the field of speech research, vocal tract is primarily
described in terms of soft tissue: glottis, pharyngeal wall, velum, tongue
and lips, without any reference to bony structures. Only the incisors are
used as fixed landmarks. Nevertheless, the cranium appears to be the basis
on which are located the hyoid bone and the larynx, on which, in turn, are
inserted vocal organs. U. Goldstein (1980) dissert ation clearly showed the
benefit of data gathered on growth of bony structures in modeling vocal tract
growth. Exploiting a radiographic database including 22 subjects (15 men and
7 women), we start to analyze the structural relationships among landmarks
widely used in physical anthropology and we evaluate the range of variability
related to sex and subject. This work is the first step of a larger project
aimed at reconstructing vocal tract from a cranium, modern of fossilized.
It is also to be considered in the framework of language emergence. A multidisciplinary
research team participated in this project, including radio logists, Egyptologists,
physical anthropologists and speech specialists.
titre: Des implémentations parallèles
pour une application de la RAP
auteurs: Yahya Ould MOHAMED EL HADJ
mots clés: reco, reconnaissance de la parole, algorithmique parallèle,
machines parallèles
abstract: We show through this work that harnessing the power of parallel
machines can increase greatly the speed and storage capacity of certain recognizers.
Obtained improvments can be exploited to expand the vocabulary on existing
real-time tasks or increase the modelization precision where the recognition
accuracy is most interesting than the real time.
titre: Entraînement de la conscience
phonologique d'enfants déficients visuels: quel support temporo-phonologique?
auteurs: V. Prost, R. Espesser, C. Sabater, K. Thomas-Bartalucci,
V. Rey
mots clés: pathologies, perception
abstract: Visually impaired children aged from 6 to 8 years seem present
perceptive confusions. Thus, they show difficulties in distinguish phonemes
place of joint and explicit manipulation. This present study suggests to demonstrate
specific difficulties of these children. Then our search tests hypothesis
of a remediation in speech acoustically modified with significant units (words)
and not significant units (non-words). Results are in favor of the use of
non-word in speech acoustically modified.
titre: L'ambissyllabicité des consonnes
géminées : le cas du berbère (tachelhit)
auteurs: Naïma Louali
mots clés: phonétique-phonologie
abstract: Berber, like Arabic and Italian, exhibts geminate consonants
The consonant system opposes a series of single consonants to geminates. The
vocabulary and more specifically the morphology develops this contrast. Phonetic
studies dealing with gemination mention length as the main parameter, distinguishing
geminate consonants from single ones. Berber scholars disagree as regards
representation and their phonological behaviour. The analysis of this kind
of consonants varies depending on the phonological theory. In the present
study, we shall examine the geminate consonants through an experimental approach
(inversion and partial word repetitions), which is based on the categorization
of these consonants by four Berber subjects. A list of 30 words in their French
or Arabic translation was first agreed upon, with 10 words in each category.
The aim of this study is to show how Berber speakers perceive these consonants.
By this experimental research, we intend to present experimental data relevant
to a discussion of the relation between phonological representation and phonetic
data.
titre: Réseaux bayésiens dynamiques pour
la reconnaissance multi-bandes de la parole
auteurs: Khalid Daoudi, Dominique Fohr, Christophe Antoine
mots clés: reconnaissance de la parole, réseaux bayésiens,
multibande
abstract: This paper presents a new approach to multi-band automatic
speech recognition which has the advantage to overcome many limitations of
classical muti-band systems. The principle of this new approach is to build
a speech model in the time-frequency domain using the formalism of Bayesian
networks. Contrarily to classical multi-band modeling, this formalism leads
to a probabilistic speech model which allows communications between the different
sub-bands and, consequently, no recombination step is required in recognition.
We develop efficient learning and decoding algorithms and present illustrative
experiments on a connected digit recognition task. The experiments show that
the Bayesian network's approach is very promising in the field of noisy speech
recognition.
titre: Apprentissage de structures de réseaux
bayésiens dynamiques pour la reconnaissance de la parole
auteurs: Murat Deviren, Khalid Daoudi
mots clés: reconnaissance de la parole, réseaux bayésiens
abstract: We present a speech modeling methodology where no a priori
assumption is made on the dependencies between the observed and the hidden
speech processes. Rather, dependencies are learned from data. This methodology
guarantees improvement in modeling fidelity as compared to HMMs. In addition,
it gives the user a control on the trade-off between modeling accuracy and
model complexity. We evaluate the performance of the proposed methodology
in a connected digit recognition task.
titre: La prosodie de la focalisation en
français : faits perceptifs et morphogénétiques
auteurs: Brichet C, Aubergé V
mots clés: prosodie
abstract: Our purpose is to study how the focalisation function, more
precisely the deixis function applied on the word domain, is implemented by
the prosodic parameters, and what is the role of the syllable vs. the word
contour. In this aim, some corpora of isolated sentences were recorded, with
two instructions : to point the word vs. to point the syllable. Some perceptive
experiments were held and then an acoustic analysis was applied. The results
confirm the role of the first syllabe traditionally observed in the literature,
but go in the sense of a contour globally shared on the word domain (a carried
contour), without any significative influence on the carrying contour of the
whole utterance, which confirms the hypothesis given as principle 4 in the
ICP model of prosody.
titre: Un logiciel de codage de la parole
basé sur le FS1016
auteurs: M. Djamah, M. Boudraa, B. Boudraa, M. Bouzid
mots clés: codage
abstract: This paper describes a speech coding software based on the
Federal Standard FS1016 coder. The objective of this work is to have in our
laboratory a basic speech coder to use it as background for our research works.
The modifications and the extensions of the basic coder (to improve the quality)
can be done easily by using the object-oriented programming.
titre: Mesure d'intelligibilité de segments
de parole à l'envers en français
auteurs: Fanny Meunier, Tristan Cenier, Melissa Barkat, Ivan Magrin-Chagnolleau
mots clés: perception, phonétique-phonologie
abstract: We ran an experiment focusing on cognitive implication of
reversed speech segments. Nine durations of reversed segments plus a non-distorted
control condition have been considered (varying between 20 ms and 180 ms)
in order to test the pattern of intelligibility degradation in French. We
observed an overall strong negative correlation between the degree of intelligibility
and the size of reversed-speech windows. These results appear to be very comparable
to those obtained in English by Greenberg & Arai [Gre01], at least on the
slope of intelligibility performance decrease. However, intelligibility loss
in French is delayed by twenty milliseconds. Apart from confirming the cognitive
ability to restore reversed speech up to a certain point, our study revealed
differences that could be interpreted as ‘language specific’.
titre: Reconnaissance de la parole pour
des locuteurs non natifs en présence de bruit
auteurs: Dominique Fohr, Odile Mella, Irina Illina, Fabrice Lauri,
Christophe Cerisara, Christophe Antoine
mots clés: reconnaissance, robustesse
abstract: In real world applications, speech recognition is con-fronted
with two main difficulties : the non native speakers and the background noise.
The aim of this paper is to compare on the same noisy database differ-ent
methods in order to increase the robustness of our HMM-based automatic speech
recognition system. To deal with the non native speakers, we have tested two
solutions: multi-models and adaptation techniques. For noisy speech, we have
evaluated two types of methods: the first one (PMC and MLLR) adapts the initial
mod-els, trained in clean speech, with a few noisy sentences. The second one
(RATZ and MCR ) tries to remove the noise from the signal without modifying
the HMM models.
titre: Développement morpho-phonologique
de deux enfants en train d'acquérir le français après un implant cochléaire
auteurs: Géraldine Hilaire, Valérie Régol, Harriet Jisa
mots clés: pathologies, perception, acquisitionLangue
abstract: Two explanations have been offered to account for omissions
of syllables in early language production: the "rhythmic production" [All78]
[All80] [Ger91] [Ger94] [Ger96] and the "perceptual account"[Ech92] [Ech93].
The longitudinal data used for our analysis cover 26 months of post implant
development. Our sample begins at 10 months post implant, when the majority
of determiners are omitted in production, and ends at 36 months post implant,
when the majority of determiners are produced. All common nouns were extracted
from the corpus and examined for: 1) rate of omission errors; 2) the stability
of the children's filler syllables; and 3) the context in which the form was
produced, i.e., monosyllabic or multisyllabic word. The results of our study
argue for a "rhythmic production account" of determiner omissions.
titre : Séparation de sources audio-visuelles
: formalisation et expérimentation
auteurs : D. Sodoyer, L. Girin, C. Jutten, J.L. Shwartz
mots clés :
abstract: In this paper, we present a new approach to the source separation
problem in the case of multiple speech signals. The method is based on the
use of automatic lipreading: the objective is to extract an acoustic speech
input from other acoustic signals by exploiting its coherence with the speaker’s
lips movements. We consider the case of an additive stationary mixture. Firstly
we present a theoretical framework showing that it is indeed possible to separate
a source when some of its spectral characteristics are provided to the system.
Then we address the case of audio-visual sources. We show how, if a statistical
model of the joint probability of visual and spectral audio input is learnt
to quantify the audio-visual coherence, separation can be achieved by maximising
this probability.