• Updated
25 June 2021

Young children's vocalisations recordings: when big data benefits from citizens' collaboration

Recording young children's vocalisations with wearable recorders  is a promising method for assessing language development. But accurate and rapid annotation of such large numbers of long recordings remains a challenge. In a study published on June 8 in Journal of Speech, Language, and Hearing Research, a team of researchers from the Laboratoire de Sciences Cognitives et Psycholinguistique at ENS-PSL and Purdue University assessed the extent to which annotations of recordings of young children's vocalisations by citizen scientists align with those collected in the laboratory. The results show that the annotations made by citizen volunteers are as good as those made in the laboratory. Citizen input can therefore overcome this challenge.

Because early language delays can negatively impact children's literacy, behavior, social interactions, and educational outcomes into adulthood, early interventions have been described as a better societal investment than late interventions. But how can we quickly and accurately determine which children are in need of these interventions ?

Wearable recorder

Advances in wearable technology have opened up new avenues for studying the language development of infants and young children. Wearable recorders allow data to be collected in the child's natural environment, over the course of several hours, which can be particularly useful for children who speak or make sounds infrequently. One disadvantage of collecting such long recordings is that there is a lot of data: in a given day, each child may be recorded for more than 10 hours. Listening to the recordings is not only time consuming, but also raises many ethical and legal issues. This article presents important evidence that, with appropriate methods and safeguards, citizen scientists can help sift through this audio haystack and provide reliable clues to each child's vocal development.

In a collaboration with researchers at Purdue University, the researchers extracted more than 11,000 audio segments likely to contain children's vocalizations from recordings of 10 children diagnosed with Angelman syndrome, a rare neurogenetic syndrome characterized by severe language impairment, and 10 infants at low risk for developmental delays. They first annotated these clips in the lab as they had done in the past, which provided them with a benchmark.

They then posted their data up on Zooniverse, a platform that connects non-expert volunteers motivated to help science with various research projects, ranging from papyrology to astronomy. For their project, they cut the audio segments into half second clips, which makes it easy for people to make quick judgments while protecting participants’ privacy. In just a few months, they benefited from the help of over 1,000 people, who together contributed over 150,000 annotations! 

BIG DATAAnnotators from the lab and Zooniverse sorted each clip into one of five categories that together characterize the maturity of a child's vocal abilities. These categories include "canonical" syllables (sounds that include consonant-vowel combinations, such as "bababa"), "non-canonical" syllables (vowel-only/consonant-only sounds, such as "aaaaaah" or "mmmm"), "crying," "laughing," and "junk" (any other sounds or background noise that may have crept into the recording, which were not include in analyses). 

Using these annotations, the researchers derived two measures that told them about individual child's vocal development, namely what proportion of clips contained speech-like vocalizations (canonical or non-canonical) as opposed to non-speech (crying or laughing); and a canonical proportion (what proportion of the speech-like clips are of the more advanced "canonical" type). They found that lab and Zooniverse annotations led to very similar measures, with correlations above 8 for both.

These results are particularly exciting at a time when data from wearables is becoming more and more common. Along with other tests, the use of crowd-sourcing can pave the way for the creation of large, high-quality datasets that more accurately capture the full diversity of children's lives. In their current work, the team of reserachers are generalizing this technique to describe not only children's vocalizations, but also those of the people around them, in their new project ("Who’s talking how ?" project). 

whos talking how
"Who's talking how ?" project

This upcoming study will help to describe language use in many more languages and cultures than has been possible using laboratory annotations.

Chiara Semenzin, Lisa Hamrick, Amanda Seidl, Bridgette L. Kelleher, and Alejandrina Cristia (2021). Describing Vocalizations in Young Children:A Big Data Approach Through CitizenScience Annotation. Journal of Speech, Language, and Hearing Research