When babies first begin to talk, their vocabulary is very limited. Often one of the first sounds they generate is “da,” which may refer to dad, a dog, a dot, or nothing at all.
How does an adult listener make sense of this limited verbal repertoire? A new study from MIT and Harvard University researchers has found that adults’ understanding of conversational context and knowledge of mispronunciations that children commonly make are critical to the ability to understand children’s early linguistic efforts.
Using thousands of hours of transcribed audio recordings of children and adults interacting, the research team created computational models that let them start to reverse engineer how adults interpret what small children are saying. Models based on only the actual sounds children produced in their speech did a relatively poor job predicting what adults thought children said. The most successful models made their predictions based on large swaths of preceding conversations that provided context for what the children were saying. The models also performed better when they were retrained on large datasets of adults and children interacting.
The findings suggest that adults are highly skilled at making these context-based interpretations, which may provide crucial feedback that helps babies acquire language, the researchers say.
“An adult with lots of listening experience is bringing to bear extremely sophisticated mechanisms of language understanding, and that is clearly what underlies the ability to understand what young children say,” says Roger Levy, a professor of brain and cognitive sciences at MIT. “At this point, we don’t have direct evidence that those mechanisms are directly facilitating the bootstrapping of language acquisition in young children, but I think it’s plausible to hypothesize that they are making the bootstrapping more effective and smoothing the path to successful language acquisition by children.”
Levy and Elika Bergelson, an associate professor of psychology at Harvard, are the senior authors of the study, which appears today in Nature Human Behavior. MIT postdoc Stephan Meylan is the lead author of the paper.
Adult listening skills are critical
While many studies have investigated how children learn to speak, in this project, the researchers wanted to flip the question and study how adults interpret what children say.
“While people have looked historically at a number of features of the learner, and what is it about the child that allows them to learn things from the world, very little has been done to look at how they are understood and how that might influence the process of language acquisition,” Meylan says.
Previous research has shown that when adults speak to each other, they use their beliefs about how other people are likely to talk, and what they’re likely to talk about, to help them understand what their conversational partner is saying. This strategy, known as “noisy channel listening,” makes it easier for adults to handle the complex task of deciphering the acoustic sounds they’re hearing, especially in environments where voices are muffled or there is a lot of background noise, or when speakers have different accents.
In this study, the researchers explored whether adults can also apply this technique to parsing the often seemingly nonsensical utterances produced by children who are learning to talk.
“This problem of interpreting what we hear is even harder for child language than ordinary adult language understanding, which is actually not that easy either, even though we’re very good at it,” Levy says.
For this study, the researchers made use of datasets originally generated at Brown University in the early 2000s, which contain hundreds of hours of transcribed conversations between children ages 1 to 3 and their caregivers. The data include both phonetic transcriptions of the sounds produced by the children and the text of what the transcriber believed the child was trying to say.
The researchers used other datasets of child language (which included about 18 million spoken words) to train computational language models to predict what words the children were saying in the original dataset, based on the phonetic transcription. Using neural networks, they created many different models, which varied in the sophistication of their knowledge of conversational topics, grammar, and children’s mispronunciations. They also manipulated how much of the conversational context each model was allowed to analyze before making its predictions of what the children said. Some models took into account just one or two words spoken before the target word, while others were allowed to analyze up to 20 previous utterances in the exchange.
The researchers found that using the acoustics of what the child said alone did not lead to models that were particularly accurate at predicting what adults thought children said. The models that did best used very rich representations of conversational topics, grammar, and beliefs about what words children are likely to say (ball, dog or baby, rather than mortgage, for example). And much like humans, the models’ predictions improved as they were allowed to consider larger chunks of previous exchanges for context.
A feedback system
The findings suggest that when listening to children, adults base their interpretation of what a child is saying on previous exchanges that they have had. For example, if a dog had been mentioned earlier in the conversation, “da” was more likely to be interpreted by an adult listener as “dog.”
This is an example of a strategy that humans often use in listening to other adults, which is to base their interpretation on “priors,” or expectations based on prior experience. The findings also suggest that when listening to children, adult listeners incorporate expectations of how children commonly mispronounce words, such as “weed” for “read.”
The researchers now plan to explore how adults’ listening skills, and their subsequent responses to children, may help to facilitate children’s ability to learn language.
“Most people prefer to talk to others, and I think babies are no exception to this, especially if there are things that they might want, either in a tangible way, like milk or to be picked up, but also in an intangible way in terms of just the spotlight of social attention,” Bergelson says. “It’s a feedback system that might push the kid, with their burgeoning social skills and cognitive skills and everything else, to continue down this path of trying to interact and communicate.”
One way the researchers hope to study this interplay between child and adult is by combining computational models of how children learn language with the new model of how adults respond to what children say.
“We now have this model of an adult listener that we can plug into models of child learners, and then those learners can leverage the feedback provided by the adult model,” Meylan says. “The next frontier is trying to understand how kids are taking the feedback that they get from these adults and build a model of what these children expect that an adult would understand.”
The research was funded by the National Science Foundation, the National Institutes of Health, and a CONVO grant to MIT’s Department of Brain and Cognitive Sciences from the Simons Center for the Social Brain.