Stanford News - February 22nd, 2017 - by Alex Shashkevich
Children’s ability to understand basic grammar early in language development has long puzzled scientists, creating a debate over whether that skill is innate or learned with time and practice.
A new Stanford study, recently published in Psychological Science, helps build evidence for the latter. Analyzing toddlers’ early language with a novel statistical approach, associate professor of psychology Michael Frank found that rule-based grammatical knowledge emerges gradually with a significant increase around the age of 24 months.
The new study, titled “The Emergence of an Abstract Grammatical Category in Children’s Early Speech,” also points out the need to gather more data that track children’s speech over time, which would help make future research more precise.
“The ability of humans to acquire and use language is a big difference between us and other species, and it’s also one of the biggest scientific puzzles out there,” said Frank, who co-authored the study. “Studying language acquisition in children is one way for us to try to find out what makes us human.”
Imitating or understanding?
Previous research has shown that children use articles, such as “a” and “the,” early and in an overwhelmingly correct way. But, Frank said, it is difficult to sort out whether children are just imitating adults or if they actually understand that articles should be used before nouns like “dog” or “ball” – and can use them appropriately with new nouns that are unknown to them.
To address that difficulty, the team created a new statistical model to measure changes in a child’s grammar over time. The model relies on Bayesian inference, a method that helps estimate the level of certainty in results. In addition, it takes into account the relationship between what the child says and what the child has heard from adults, separating imitation from generalization.
Researchers applied this model on data sets available for 27 toddlers and found that rule-based grammatical knowledge in their speech wasn’t constant and was more present in older children.
Frank said the statistical model they used allowed them to not only analyze children’s language but also stay away from overly confident interpretations when there was too little data for a particular child.
Big data on babies
The study underscored the fact that data on language development in children under 2 years old is lacking. To characterize children’s initial level of grammar use, Frank said, it’s critical for scientists to have a sophisticated analytical model as well as consistent recordings that start from the time children begin talking.
According to Frank, the current lack of data and the analytical challenges it presents have led to researchers on opposite sides of the grammar debate to draw contradictory conclusions. For instance, two studies published in peer-reviewed journals in 2013 used similar data sets, but one inferred that grammatical knowledge is innate while the other concluded that grammar is a learned skill.
“People have very strong feelings about the question of innateness versus learning,” Frank said. “We really didn’t know what to expect because there were these conflicting reports out there.”
The team hopes that its statistical model, together with new data sets, will help move the debate forward.
To help increase the pool of data, Frank and his colleagues are building an online database called Wordbank. The site aims to spur the gathering of data on children’s vocabulary and early language development and encourages researchers to share their data with different institutions and universities. Frank is also collaborating on a smartphone app for collecting early vocabulary data from parents.
“It’s going to take a tremendous amount of data to study this problem and build enough evidence for how children learn language,” Frank said. “We’re hoping that once we have those data, we can get a clearer picture of children’s early learning.”
Additional authors on this study were Stephan Meylan (lead author), a graduate student at the University of California, Berkeley; Brandon Roy, a former postdoctoral researcher at Stanford and the MIT Media Lab; and Roger Levy, an associate professor at MIT.
This work was supported by the National Science Foundation, the Alfred P. Sloan Research Fellowship and the Center for Advanced Study in the Behavioral Sciences.