Thesis defense

Decoding perceptual vowel epenthesis: Experiments & Modelling

Speaker(s)
Adriana Guevara Rukoz
Practical information
19 October 2018
2:45pm
Place

Salle Celan, 45 rue d'Ulm

LSCP

Why do people of different linguistic background sometimes perceive the 
same acoustic signal differently? For instance, when hearing nonnative 
speech that does not conform to sound structures allowed in their native 
language, listeners may report hearing vowels that are not acoustically 
present. This phenomenon, known as perceptual vowel epenthesis, has been 
attested in various languages such as Japanese, Brazilian Portuguese, 
Korean, and English. The quality of the epenthesized vowel varies 
between languages, but also within languages, given certain phonemic 
environments.
How much of this process is guided by information directly accessible in 
the acoustic signal? What is the contribution of the native phonology? 
How are these two elements combined when computing the native percept? 
Two main families of theories have been proposed as explanations: 
two-step and one-step theories. The former advocate an initial parsing 
of the phonetic categories, followed by repairs by an abstract grammar 
(e.g., epenthesis), while one-step proposals posit that all acoustic, 
phonetic, and phonological factors are integrated simultaneously in a 
probabilistic manner, in order to find the optimal percept.
In this dissertation, we use a combination of experimental and modelling 
approaches in order to evaluate whether perceptual vowel epenthesis is a 
two-step or one-step process. In particular, we investigate this by 
assessing the role of acoustic details in modulations of epenthetic 
vowel quality.
In a first part, results from two behavioural experiments show that 
these modulations are influenced by acoustic cues as well as phonology; 
however, the former explain most of the variation in epenthetic vowel 
responses. Additionally, we present a one-step exemplar-based model of 
perception that is able to reproduce coarticulation effects observed in 
human data. These results constitute evidence for one-step models of 
nonnative speech perception.
In a second part, we present an implementation of the one-step proposal 
in Wilson et al. (2013) using HMM-GMM (Hidden Markov Models with 
Gaussian Mixture Models) from the field of automatic speech recognition. 
These models present two separate components determining the acoustic 
and phonotactic matches between speech and possible transcriptions. We 
can thus tweak them independently in order to evaluate the relative 
influence of acoustic/phonetic and phonological factors in perceptual 
vowel epenthesis. We propose a novel way to simulate with these models 
the forced choice paradigm used to probe vowel epenthesis in human 
participants, using constrained language models during the speech 
decoding process.
In a first set of studies, we use this method to test whether various 
ASR systems with n-gram phonotactics as their language model better 
approximate human results than an ASR system with a null (i.e., no 
phonotactics) language model. Surprisingly, we find that this null model 
was the best predictor of human performance.
In a second set of studies, we evaluate whether effects traditionally 
attributed to phonology may be predictable solely from acoustic match. 
We find that, while promising, our models are only able to partially 
reproduce some effects observed in results from human experiments. 
Before attributing the source of these effects to phonology, it is 
necessary to test ASR systems with more performant acoustic models. We 
discuss future avenues for using enhanced models, and highlight the 
advantages of using a hybrid approach with behavioural experiments and 
computational modelling in order to elucidate the mechanisms underlying 
nonnative speech perception.