What Four-Year-Olds Can do That AI Can’t

This script corresponds to the episode first released in September 2019.

What Four-Year-Olds Can Do That AI Can’t

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.”

Alan Turing famously wrote this in his groundbreaking 1950 paper Computing Machinery and Intelligence, and laid the framework for generations of machine learning scientists to follow. Yet, despite increasingly impressive specialised applications and breathless predictions, we’re still some distance from programmes that can simulate any mind, even one much less complex than a human’s.

Perhaps the key came in what Turing said next: “Our hope is that there is so little mechanism in the child brain that something like it can be easily programmed.” This seems, in hindsight, naive. Moravec’s paradox applies: things that seem like the height of human intellect, like a good stimulating game of chess, are easy for machines, while simple tasks can be extremely difficult.

If you think about it in a particular way, human intelligence and machine intelligence can often seem to be worlds apart. You or I would struggle to multiply together two six-digit numbers — calculators and computers can do this in an instant. Yet to replicate most of the things that we all take for granted — things like moving about in our environments, navigating, finding the appropriate grip to pick up and object — this all requires machines and computer intelligence many thousands of hours of training and a significant amount of computational power. It’s the equivalent of doing billions of those nasty multiplications — yet we all take it for granted.

Similarly, you may well not know much about the laws of physics — although hopefully listening to this show is helping a little bit. I am fabulously inept at every single sport that exists, surprising no-one. But when Jos Buttler smashes a cricket ball over midwicket, or Serena Williams hits a perfect backhand winner down the line, neither of them are actually doing the explicit calculations of Newtonian physics, trajectories, and integrating the differential equation in their heads. In the same way, we don’t seem to explicitly crunch all of the numbers necessary to do everything that we can do — or, at least, we don’t have access to that raw calculation power to actually… you know, perform calculations.

Estimates for the actual computational power of the human brain vary, and since it is a computation that’s “different in kind”, it probably makes more sense to go the other way: how much computer power would you need to great a computer that could do everything a brain could do? By some metrics, the human brain is still more powerful than our best supercomputers. And childhood has an awful lot to do with that.

In some ways, it’s an evolutionary gamble. Humans are vulnerable for an extraordinarily long time — it takes years and years before we can fend for ourselves and reproduce, longer than the lifespans of many animals. We require constant care for years, and cannot feed ourselves for many of them. During this time, a huge proportion of the food that other humans give us is devoted to developing that brain — somewhere between 60 and 80% of the energy goes into the brain in early childhood, leaving the body to develop much more slowly and leaving us vulnerable for a much longer time.

Yet it’s clear that this prolonged childhood is key to intelligence. Amongst dolphins, amongst humans and other primates, and even amongst the smartest birds — corvids, like rooks or crows — the species with the longest childhoods and infancies tend to be the most intelligent, because they have that extra time to develop.

Birds are a great example, in fact. Crows take 1–2 years to mature and can use tools and solve abstract reasoning problems in experiments. Chickens take 1–2 weeks to mature and can basically peck at the ground. They’re practically as intelligent after their heads are cut off as they were before. So clearly the prolonged childhood development allows for a greater final intelligence.

And it’s this evolutionary gamble — requiring us to go through a protracted phase of learning where we are essentially completely dependent on others — that has paid off, and, well, made it all possible. Your milage may vary on whether that’s a good thing.

But if children are our template for the simplest general, human-level intelligence we might programme, then surely it makes sense for AI researchers to study the many millions of existing examples.

This is precisely what Professor Alison Gopnik and her team at Berkeley do. They seek to answer: how sophisticated are children as learners? Where are children still outperforming the best algorithms, and how do they do it?

Some of the answers were outlined in a recent talk at the International Conference on Machine Learning. The first, and most obvious difference between four-year-olds and our best algorithms is that children are extremely good at generalising from a small set of examples. ML algorithms are the opposite: they can extract structure from huge datasets that no human could ever process, but generally large amounts of training data are needed for good performance.

This training data usually has to be labelled, although unsupervised learning approaches are also making progress. In other words, there is often a strong “supervisory signal”, coded into the algorithm and its dataset, consistently reinforcing the algorithm as it improves. Children can learn to perform generally on a wide variety of tasks with very little supervision, and they can generalise what they’ve learned to new situations they’ve never seen before.

Even in image recognition, where ML has made great strides, algorithms require a large set of images before they can confidently distinguish objects: children may only need one. How is this achieved?

Professor Gopnik and others argue that children have “abstract generative models” that explain how the world works. In other words, children have imagination: they can ask themselves abstract questions like “If I touch this sharp pin, what will happen?” — and then, from very small datasets and experiences, they can anticipate the solution.

In doing so, they are correctly inferring the relationship between cause and effect from experience. Children know that the reason that this object will prick them unless handled with care is because it’s pointy, and not because it’s silver or because they found it in the kitchen. This may sound like common sense, but being able to make this kind of causal inference from small datasets is still hard for algorithms to do — especially across such a wide range of situations.

Generative models are increasingly being employed by AI researchers — after all, the best way to show that you understand the structure and rules of a dataset is to produce examples that obey those rules. Such neural networks can compress hundreds of gigabytes of image data into hundreds of megabytes of statistical parameter weights, and learn to produce images that look like the dataset. In this way, they “learn” something of the statistics of how the world works. But to do what children can and generalise with generative models is computationally infeasible, according to Gopnik.

This is far from the only trick that children have up their sleeve which machine learning hopes to copy. Experiments from Professor Gopnik’s lab show that children have well-developed Bayesian reasoning abilities. Bayes’ theorem is all about assimilating new information into your assessment of what is likely to be true, based on your prior knowledge. For example, finding an unfamiliar pair of underwear in your partner’s car might be a worrying sign — but if you know that they work in dry-cleaning and use the car to transport lost clothes, you might be less concerned.

Scientists at Berkeley present children with logical puzzles, such as machines that can be activated by placing different types of blocks or complicated toys that require a certain sequence of actions to light up and make music.

When they are given several examples (a small dataset of demonstrations of the toy, for example), they can often infer the rules behind how the new system works from the age of 3 or 4. These are Bayesian problems: the children efficiently assimilate the new information to help them understand the universal rules behind the toys. When the system isn’t explained, the children’s inherent curiosity leads them to experimenting with these systems — testing different combinations of actions and blocks — to quickly infer the rules behind how they work. This is one of the key aspects of how children learn — very actively, by getting into everything.

Indeed, it’s the curiosity of the children that actually allows them to outperform adults in certain circumstances. When an incentive structure is introduced — i.e. “points” that can be gained and lost depending on your actions — the adults tend to become conservative and risk-averse. The children are more concerned with understanding how the system works, and hence deploy riskier strategies. Curiosity may kill the cat, but in the right situation, this can allow the children to win the game by identifying rules that the adults miss by avoiding any action that might result in punishment.

This research shows not only the innate intelligence of children, but also touches on classic problems in algorithm design. The explore-exploit problem is well known in machine-learning. Put simply, if you only have a certain amount of resources — time, computational ability, etc. — are you better off searching for new strategies, or simply taking the path that seems to most obviously lead to gains?

Children favour exploration over exploitation. This is how they learn — through play and experimentation with their surroundings, through keen observation and through asking as many questions as they can. As we get older — kicking in around adolescence in Gopnik’s experiments — we switch to exploiting the strategies that we’ve already learned, rather than taking those risks.

Children are social learners: as well as interacting with their environment, they learn from others. Anyone who has ever had to deal with a toddler endlessly using that favourite word, “why?”, will recognise this as a feature of how children learn!

They are excellent at imitating other people — as you’ll know if you’ve ever played with a child or talked to them as they learn to speak. And what’s more, children of a certain age develop a “theory of mind” — in other words, they start to understand that other people aren’t omniscient, but actually other humans limited by the same constraints that limit them. This has been demonstrated in many famous experiments throughout the years — for example, the experimenter will hide an object from a third party without them seeing where it’s hidden, and then ask the child where the third person thinks the object is. At a certain age, children realise which knowledge is and isn’t possessed by the third party. And this is also a form of reasoning, and of empathy — learning something by putting yourself in someone’s shoes and considering what knowledge they had at the time.

One fascinating aspect that arose from Gopnik’s experiments is that you can influence how children will approach a task based on how you behave. In the experiment with the toys, where different strategies could make the toy light up and play music, the kids were strongly influenced by how the experimenter behaved. If the experimenter behaved authoritatively, like a teacher, as if explaining precisely how the device worked, then the children were far more likely to copy one of the teacher’s strategies. But, if the experimenter behaved confused — and acted as if they needed help, and acted surprised when they succeeded — then the child’s creativity came into play, and they would actually invent better strategies. This is even true when the kids are shown precisely the same information.

This makes sense, of course — if someone is showing you something and appears to be an expert, you go into recieving-information and copying mode, rather than inventive, exploratory, devising new strategies mode. But it does have some interesting possible implications for how we should consider educating children — so as to not suppress their innate creativity.

These concepts are already being imitated in machine-learning algorithms. One example is the idea of “temperature” for algorithms that look through possible solutions to a problem to find the best one. A high-temperature search is more likely to pick a random move that might initially take you further away from the reward. This means that the optimisation is less likely to get “stuck” on a particular solution that’s hard to improve upon, but may not be the best out there — but it’s also slower to find a solution. Meanwhile, searches with lower temperature take fewer “risky”, random moves and instead seek to refine what’s already been found.

In many ways, humans develop in the same way — from high-temperature toddlers who bounce around, playing with new ideas and new solutions, even when they seem strange — to low-temperature adults who take fewer risks, are more methodical, but also less creative. This is how we try to programme our machine learning algorithms to behave as well.

It’s nearly 70 years since Turing first suggested that we could create a general intelligence by simulating the mind of a child. The children he looked to for inspiration in 1950 are all knocking on the door of old age today. Yet, for all that machine learning and child psychology have developed over the years, there’s still a great deal that we don’t understand about how children can be such flexible, adaptive, and effective learners.

Understanding the learning process and the minds of children may help us to build better algorithms — but it will also help us to teach and nurture better and happier humans, and ultimately, isn’t that what technological progress is supposed to be about?