Arnon Lotem’s dog wants him to make the rain stop. In winter, when she signals that she wants to go out, he opens the door for her, but if it’s raining, she sits down and looks up at him, expectantly.
“It’s as though she’s waiting for me to stop the rain for her,” says Prof. Lotem, a behavioral ecologist, who teaches in Tel Aviv University’s zoology department and the Sagol School of Neurobiology. “And when you think about it, it’s not completely untenable. You know, there are so many things I do for her. If I’m the one who turns on and off the sprinklers she likes to romp in, why shouldn’t I also be able to stop the water that’s coming down from above?”
With regard to a dog, it’s clear why the animal learned to connect the ability to make things happen with its owner. Power resides with the master: He can be generous or punitive, he’s the one to turn to for approval or help. Charles Darwin thought that the ability to make that connection was the basis for man’s belief in his God; in other words, faith is an evolutionary development that evolved from that association. The fact that many findings show that faith has evolutionary advantages (for example, by creating social cohesiveness or improving the health of individuals) is consistent with that idea.
Lotem does not purport to trace the development of human faith. But as a behavioral ecologist who’s interested in points where ecology, behavior and evolution overlap, he’s trying to understand how animals have evolved the ability to make connections between various phenomena. Generally speaking, it can be said that he’s studying learning processes – meaning, the ability to collect different kinds of data, make associations between them, and from them infer a desirable mode of behavior.
“Most behaviors that animals require in order to cope with their habitat are not gene-encoded,” he notes. “They are learned, and their genetic component is reflected in the existence of genes that shape the mechanisms of learning.”
Lotem’s interest lies in those mechanisms and their evolution. His basic hypothesis is that “high” learning – such as the acquisition of social behaviors or language – sprang from the simple rules of basic learning, because that’s how evolution works. It takes simple mechanisms and gradually modifies them to create complex ones. Examples are the functioning of the heart and the physiology of the digestive system, as well as brain activity.
For example, scientists are familiar with the brain’s evolution in terms of its stages of development. Small nerve bundles that were scattered in the bodies of lower animals developed into large vesicles in the heads of more developed animals and later evolved into the brain stem, the little brain (cerebellum) and the large brain (cerebrum). Subsequently the cortex, which may be viewed as the center of cognition, also developed, reaching its height of sophistication in human beings.
This anatomical analysis, however, cannot account for the development of the mechanisms that transformed the brain into a learning machine. There is no comparison between an earthworm’s instinctive recoil from light – which ensures that this creature will always remain in the depths of the moist soil – and the increasingly effective attempts of a baby chimp to crack a coconut with a stone. And neither of those behaviors is similar to the ability of a human infant to learn its mother tongue.
“I want to understand how the shift occurs from associative learning, which creates a simple connection between stimulus and response, and complex learning, which involves past experience and enables a forecast of the future. It’s an ambitious task,” Lotem admits, “but with the years is seems to me that we are at least approaching an understanding of the process.”
Lotem’s collaboration with a cognitive psychologist and a computer scientist from Cornell University – Prof. Shimon Edelman and Prof. Joseph Halpern, respectively – and with a doctoral student, Oren Kolodnoy, has led to development of a learning model that can be explored and studied by means of computer simulations. Interestingly, the model, which imbued the computer with the ability to learn to find food in a complex and structured environment, also imparted to it the ability to learn a language.
Lotem: “We propose, then, that with all due respect for man’s superiority over animals and our ability to acquire language, the sources and roots of that ability reside in skills that are far more ancient, which are shared by all animals.”
Before we get to human superiority, let’s talk about the “simple learning principles” on which your model is based. What does this mean, in practice?
Lotem: “In general, I am referring to associative learning according to the familiar behaviorist principles, such as operant conditioning, in which an animal draws a connection between two events and learns how to act vis-a-vis its surroundings in accordance with its history of reinforcements. Behavior that produces a positive reinforcement will repeat itself, while behavior that generates negative reinforcement will fade and disappear. A beetle that finds sweet nectar in a red flower will learn to draw a connection between the two data and will search for the flower, and a dog that’s bitten by another dog will learn to flee from it.
“The question is, what the basis is for making the connection between the two data. When you see a computer on my desk and a chocolate bar next to it, is it right for you to infer that a computer heralds the appearance of chocolate? Because this is a critical factor in learning: to determine the degree to which the items that we see together are actually truly interconnected, or whether the connection between them is purely chance. ‘True’ in the biological context means that the connection has predictive power – concerning what will happen in the future – hence it is important for survival.”
When the psychologist B.F. Skinner carried out his famous experiments, which showed that behaviors are learned by means of the reinforcement associated with them, he encountered some pigeons that behaved differently every time they received food. One walked in circles, another pecked at the corners of the cage, and so on. The reason was that they learned too fast. At the start of the experiment, when they received the food, the first pigeon happened to be turning in circles and the second was pecking in the corners of the cage, and they each immediately linked the two events they experienced. Skinner dubbed them “superstitious pigeons”: They attributed meaning to two stimuli that had only a chance connection between them.
“The important question,” Lotem declares, “is what is the correct statistical test that needs to be done in order to decide whether a particular linkage is true or random.”
What do you mean by “correct statistical test” – does a pigeon’s brain do statistics?
“The brain of a pigeon, of a beetle and our brain, too, all do statistics. Contrary to the behaviorist conception, the brain is not a rigid learning machine in which a particular event necessarily leads to another particular event. It’s more appropriate to talk about a wide-open space of possibilities, in which different probabilities exist for events to occur together. According to our working model, learning organizes the information in the brain in the form of a network of data and events between which there are connections. To create the network, the brain calculates statistics on the data it takes in from the surroundings; it monitors the distribution of the data and determines the level of connections – the associations – between them. The ramified network represents the relations between the data that construct it, and afterward serves as a map that the animal uses to navigate in its world. When it encounters information, it knows immediately a host of other things that are related to that information and estimates their probability of occurring.”
Of bears and honey
How does an animal construct the network?
“When an animal learns its surroundings, it scans the data, stimuli, informational details, events – whatever term you wish – and when it notices an interesting datum, such as food, a word that its mother spoke, or a chirp, it gives the data weight in the memory. If the data reappears, its weight will be reinforced; if not, it will fade away.”
What is an “interesting datum”? How does the brain know which of all the data is supposed to interest it?
“That’s a good question, and it’s a central part of our model. For now, I will say that all animals possess innate templates in their brain, which are the first points of interest on which initial learning is based. In most animals, for example, receptors developed to define what is tasty – sweet – and what is not – bitter. Accordingly, for a young bear in the forest, honey is an interesting datum. It’s a genetic thing. The bear scans its surroundings and takes in additional data, such as a swarm of bees hovering in the air, and its brain creates an associative link between the swarm and the honey.
“If, subsequently, a swarm of bees appears separately from honey, the connection will be weakened. But if the bear again encounters bees in proximity to honey, the connection between the two data items will be reinforced, and when it reaches a certain threshold level it will become fixed in its long-term memory. The bear will start to look for bees, which augur the presence of honey, and henceforth they will become of interest. If the bees appear in high frequency along with additional data, such as next to a specific type of flower or after rain – those data, too, will become interesting and will be added to the growing network in the brain.”
Can an animal monitor all the data in its world and discover the connections between them?
“To begin with, it’s important to emphasize that animals do not monitor data consciously, of course. These processes occur at the level of neurons, some of which we know from studies that won their researchers a Nobel Prize. Second, an animal does not really monitor all the data in the area. This is a significant point in our model, and one that many scientists tend to ignore. Generally, when we study learning processes, we ‘serve’ the animals a very limited number of data and examine what they do with them. A mouse in a laboratory has to choose between two levels – pressing one will give it a certain type of food, and pressing the other will give it something else – but in nature there aren’t only two pedals.
“In their natural habitat, animals are faced with tremendous amounts of data, hardly any of them relevant to them, and the question, indeed, is how they cope with that cornucopia of information. The answer is: They don’t. Most of the data are filtered out, and they only do statistics on some of them. In contrast to a computer, which remembers all the data it’s fed, our brain developed in a manner that limits the quantity of data it’s primed to receive and remember. Our model hypothesizes that the brain does this ‘intentionally’; that is, that the mechanism of filtering the data from the surroundings is an integral element in the learning process.”
How do animals filter and narrow down the data they take in?
“One way of doing that is, for example, to demarcate a ‘window’ and scan only those data that exist within the window.”
What is a “window”?
“Take, for example, the famous phenomenon of imprinting, which makes it possible for a newly hatched chick to connect to its mother. That’s an example of a very simple learning process, which makes use of the ‘window’ principle – in this case, a window of time. The chick’s genes are not capable of supplying its brain with a sufficiently accurate picture to provide a quick response to the question ‘Who is my mother?’ – in other words, who laid the egg I emerged from and who is supposed to look after me? The speed of the response is critical – the chick’s life depends on it – so evolution supplanted that complex question with a simpler one: ‘Who is by my side most of the time?’
“The chick’s brain is equipped with an innate template – inborn – of the mother’s image, and it has very limited window-time to scan the surroundings and determine which of the characters that inhabit them – goose, frog or squirrel – is more consistent with the template and also appears next to the chick most frequently. Thanks to that narrow window of time, which is about an hour, the number of data it has to deal with is small. Because generally it’s the mother that is there most of the time, the chick will ‘lock on’ to that object and reject the squirrel or the frog. Henceforth, it will follow the object that was learned as ‘mother,’ which will become a data of interest, and all data related to it will become worth examining. The potential risk of this simple mechanism is seen when a bemused scientist presents a newborn chick with a shoe: The chick ‘adopts’ it and follows it as though it were its mother. Given that shoes rarely appear next to chicks in nature, the risk seems negligible.
“Were it not for the limitation created by the window of time, the chick would have to examine the objects in its surroundings throughout its life and do a great many calculations. The limitation thus enables rapid learning with the aid of a little memory and an ability to calculate in a simple way. According to our model, limitations are part of the solution; that is, they are part of the learning mechanism.”
How is all this related to complex learning, such as language acquisition?
“We have reasons to think that language acquisition, too, operates through a learning mechanism that is based on a window that limits the quantity of data acquired and imbues them with statistical principles. Consider, for example, the bewilderment we all feel when we encounter a foreign language, which sounds to us like one long, endless sentence. How are we supposed to know where one word ends and another begins? According to the statistical model, an infant learns how to do this by checking the frequency of the various syllable sequences and assigning them different weight in the memory. Sequences that represent words and expressions in the language, and that will therefore be repeated, will receive increasing weight, and finally will be burned into the memory and learned. Random sequences of syllables – gibberish – will fade and eventually vanish.”
The idea that language acquisition is based on statistical processes is not new.
“It is indeed an old idea, and computer scientists have been able to teach computers to understand language based on statistical principles. But in most of those cases the learning was based on the computer’s vast memory and tremendous power of calculation. The model according to which we are working takes into account limited memory and powers of calculation, in accordance with the brain’s natural tendencies.
“This model hypothesizes that language acquisition, too, is accomplished according to a ‘window’ principle that delimits the data: the window of time of human memory. Here’s the principle: An infant listens to the flood of verbiage and notices the sequence of syllables and the connections between them. He remembers them for a short time and then forgets, unless he encounters them again, and then he reinforces them in the memory. For example, an infant who hears the sentence, ‘Mommyboughtadoll,’ will remember the sequences of syllables ‘mommy,’ ‘mommybough,’ ‘ommyb,’ ‘mmybought’ and so on, for a limited period of time. Because ‘mommy’ will appear in subsequent speech, too, within a relatively short time, its weight in the memory will increase, and after it recurs with sufficient frequency, it will be fixed in the long-term memory. The other sequences will disappear. In other words, the effect of delimited memory is to cause only sequences that appear with high frequency, and for a short period, to stand out and be fixed in the memory.”
That reminds me of an art installation I once saw, I think by a Zen monk. He used a brush dipped in water to paint on a burning-hot surface. The painting lasted for only a few seconds, the length of time it took the water to evaporate, unless the brush was used to make the lines wet again.
“That’s a fine image for the process of fading and fixation of words in memory’s window of time. If we take the image further, because the brush does not repeat exactly the same lines but creates new ones that cross the old ones, what will stand out is their intersection. In language that’s the equivalent of what parents tend to do naturally: to repeat sentences in similar though not identical versions. For example, ‘Here is a doll,’ ‘Here is a pretty doll,’ ‘What a big doll!’ It’s been found that about 20 percent of all direct speech to infants consists of talk of this kind, which emphasizes the repetitiveness of the word sequences so they will be identified and become fixed in the memory. Afterward, whenever the infant hears an utterance, he will associate it with the words that are already fixed in his memory, associate the new syllable sequences to it and expand the network of data.”
Although the idea that language acquisition is based on statistical processes is not new, it was relegated to the sidelines for decades because of the dominance of the theory propounded by the American linguist Noam Chomsky. According to Chomsky, our linguistic infrastructure is innate; in other words, many of the rules of language are not learned but have been implanted in the brain as a singular product of the human DNA. With their model, Lotem and his associates join a long list of researchers who argue the opposite: namely, that our linguistic ability is mostly based on identifying repeated patterns in the language we hear, and not on the application of a priori rules that exist in the brain.
Lotem: “The simulation of the model did not spring from linguistics. To begin with, we developed it in order to examine whether the learning model – which constructs an associative network from the frequency of the connections between the data, and is based on the existence of a limited window of memory – can serve a computer to learn a virtual environment and find food in it. We then showed that those same rules can also serve a computer to learn a language. This simulation shows in principle that the statistical principles that animals make use of to learn relatively simple skills, such as foraging for food, can also be suitable for learning complex skills. Language acquisition is a good example, but it’s also valid for learning social behaviors, making decisions and more.”
Wherein, then, is the superiority of humans over animals? Why are we capable of language learning but chimpanzees are not?
“Animals have quite a complex communication ability, but scientists agree that our language phenomenon is indeed unique. What differentiates us from the chimpanzee is a challenging question to which we do not yet have an answer, but our model can offer a possible direction in the sense of identifying the candidates for evolution. As I noted, there are no genes that encode directly for language, but rather for the different parameters of the learning mechanisms that developed gradually from the basic, animal mechanisms. Those parameters changed according to the principle that guides evolution – and this is the important point – that relatively small changes suffice to yield big changes. One such parameter, for example, steers children to be more attentive to human language than chimpanzees, in the same way as chicks are more attentive to chirps than elephants are, and thus are exposed to the appropriate source of information. Another parameter is related to human memory, which is more suited to the segmentation of language, in the sense of the size of memory’s window of time.”