Effects of Cognitive Load on Speech Production and Perception

Résumé :
The objective of this thesis is to study the effects of cognitive load on the production and perception of speech, and especially the prosodic characteristics of French speech produced under high levels of cognitive load. Cognitive load reflects the mental demand placed by a task on the person performing it, and is derived from the limited capacity of cognitive systems, such as working memory and attention. Speech production (conceptualisation of a message, formulation and articulation) and speech perception and comprehension are processes that engage cognitive resources, including working memory, to a different degree. It is expected that situations of high cognitive load will cause detectable effects in the prosodic structuring of speech. The main hypothesis is that the effects of cognitive load on prosody will be detected in the temporal organisation of speech and the segmentation (phrasing) of utterances. It is hypothesised that limitations of working memory will affect speech planning, leading to a marked increase in mismatches between prosodic and syntactic boundaries. We expect to find an increase in the number of occurrences of major prosodic boundaries inside minor syntactic units (chunks), i.e. in positions where there are normally not expected. On this basis, with respect to temporal prosodic measures, we expect to find an increase in the frequency of long pauses, and a more variable speech rate. Four experimental studies were conducted. In Study 1, participants performed a Stroop naming task under dual-task and time pressure, and a Reading Span task. Speech recordings and electroglottographic (EGG) data were collected, in order to analyse voice features. Study 1 is a replication of the CLSE English corpus (Yap, 2012) for French. In Study 2, monologue speech was recorded and subjects were asked to memorise information of increasing complexity and answer reading comprehension questions. Study 3 focuses on simultaneous interpreting, a task that is cognitively demanding and involves both language comprehension and production. The prosodic characteristics of two versions of the same text were compared: the original output of a professional conference interpreter, and a rehearsed reading of it by the same person. In Study 4, cognitive load was induced through the Continuous Tracking and Reaction task in a driving simulator, while pairs of participants engaged in a series of tasks (memorisation and summarising, dialogue to exchange information, debate, repeating syntactically unpredictable sentences and a simple game). Results indicate that under increased cognitive load, speakers produce more numerous and longer in duration silent pauses; there is also an increase in the variability of articulation rate (more accelerations and decelerations). We confirm our hypothesis that cognitive load incurs an increase in the number of occurrences of major prosodic boundaries inside minor syntactic units (chunks), and that silent and filled pauses are placed incongruently with the syntactic structure. However, while filled pause ratio (proportion of speech time spent on filled pauses) increases, some speakers do not produce more filled pauses but rather produce longer filled pauses or drawls (hesitation-related lengthening). There was no systematic change in the mean pitch of speakers under cognitive load; in some of the studies it was observed that high cognitive load tasks result in a decrease of mean pitch variance and pitch movements. EGG results indicate that the Closed Quotient measure increases as cognitive load increases. Further contributions of this thesis include the development of software tools for the automatic annotation of French spoken corpora (DisMo: morphosyntactic tagging, disfluency detection and annotation; Promise: syllabic prosodic prominence and prosodic boundary detection; a tool for extracting temporal measures from annotated dialogues), the development of a new tool for working with spoken corpora (Praaline), and the development of 3 corpora of French speech produced under cognitive load (approximately 50h in total).