Computational psycholinguistics and spoken word recognition in the bilingual and the monolingual

Résumé :
This is a doctoral thesis in computational psycholinguistics, an interdisciplinary research field combining expertise and experiences in linguistics, psychology, and computer science. The thesis takes as a subject the simulation of human word recognition, that is, it aims at modeling on the computer the cognitive process of how we activate and access words, their forms, and their constituting components, in our mental lexicon. It concerns spoken (rather than written) word units and consequently deals with the simulation of auditory (in contrast to visual) word recognition. Specifically, this thesis puts forward two new models that we have developed and built ourselves: One is named FN5 and simulates spoken word recognition in monolinguals; the other, called BIMOLA, models spoken word recognition in bilinguals. The monolingual model FN5 is on French and therefore contains a lexicon of 17,668 French words (nouns, determiners, and prenominal adjectives), some of which having variants and giving rise to a total of 20,523 pronunciations. FN5 processes single (i.e. isolated) words as well as sequences of two connected words (determiner + noun, or prenominal adjective + noun). It implements a new approach to recognizing sequences of words, by means of optimizing the words’ alignment positions and pronunciation variants. In addition, it provides for several phonological phenomena that can take place within a word or at boundaries between words (deletion of schwa, linking with and without liaison, word contractions including elision). To account for dialectal differences, it may be run in either of two versions, standard French or Swiss French. The bilingual model BIMOLA deals with two languages all at once, English and French; it includes an English-French bilingual lexicon of 8,696 words (all verbs, 4,348 for each language); and it operates in various language modes (i.e. global configurations of the bilingual’s two languages). BIMOLA is able to identify words from either language (always single words, to keep things easy), including guest words, that is, code-switches and borrowings from one into the other language. The two models share a phonetic feature matrix that represents similarities and differences of phonemes both within and between the languages and dialects. As revealed in the evaluations, both our models have a great overall recognition performance and are able to simulate a large number of specific psycholinguistic effects.