Around the middle of our meeting, observing my fingers on the computer keyboard, Lior Shamir tells me he is sure it will soon be possible to extract a great deal of personal information from my typing rhythm. In theory, “Your medical condition could be gleaned just from the speed of your typing,” says the Israeli-born professor. He speaks with the ease of a scientist who for years has been breaking images down into millions of fragments of information, and thereby translating reality into dimensions of which most of us are not even aware.
“I have no doubt that in another 50 years quite a bit will be written about this period,” he notes in his dry, analytical, measured and restrained style. “They will write about how people gave all their personal data so freely to organizations, even though they didn’t have the slightest idea what use is being made of the sensitive information,” he continues. For a moment it’s not clear whether he’s more outraged by the invasive collection of information or by the nonchalance with which people surrender their privacy.
Shamir, an associate professor in the computer science department of Kansas State University, has no intention of becoming part of the intrusive effort to round up everyone’s personal information – even though probably few scientists in the world could collect it as efficiently, systematically and precisely as he. In recent years he has published a series of fascinating and widely discussed research studies, based on the advanced programs he has developed.
Using them, he is able to break down vast quantities of information into dry data and exact numbers. He has studied hundreds of political speeches, thousands of works of art, hundreds of hours of music, the performance of thousands of soccer players, and numberless images of galaxies moving through space. He has also done a comparative analysis of thousands of hours of dialogue between whales in the ocean depths, in different parts of the world.
An example is an article Shamir published last December in the Journal of Popular Music Studies, which generated considerable attention. It’s a comparative analysis, unprecedented in scope, of the changes in the nature of Western popular music from the 1950s to today. He analyzed 6,150 songs, all of which appeared on Billboard magazine’s top 100 chart from 1951 to 2016. The program he wrote breaks down the songs’ content into levels of anger, fear, anxieties, love and other emotions.
“The results of the study show clearly how music changed over the years,” he says. “The emotions that were expressed in the music of the 1950s are completely different from the emotions contained in songs of the 2000s. Today’s music expresses far more anxiety, far more sadness, and there are far fewer expressions of happiness.”
Shamir pulls from his laptop a detailed graph quantifying the language disparities between Eminem and John Lennon, and between Bob Marley and Elton John. All told, the research shows, since the 1950s expressions of anger and disgust have doubled, fear has surged by 50 percent and, at the same time, openness and personal security have plunged by dozens of percentage points.
The professor makes no pretense of being a music critic or a scholar of culture. But he does not want his research to remain at the level of arid graphics, and tries to impart a social-cultural interpretation to the findings. “In the 1950s, music played a very different role from what it did beginning in the mid-1960s,” he explains.
“In the 1950s, music was meant to entertain, to make people happy. But from the mid-1960s, we see a sharp rise in the fear and anger being expressed in the songs, the immediate context of which was the Vietnam War and the ‘flower generation’ that sprang up in its wake. In that period, music changed its goal; it became a tool for social protest. The leading example is Bob Dylan, who enjoyed tremendous popularity in those years. In parallel, we see the change undergone by the Beatles, for example with ‘Nowhere Man’ [December 1965], their first song that doesn’t deal with love and romantic themes.”
Did the songs become angrier and less happy because society grew more frightened and angrier, or was it the singers who changed and suddenly developed a social and political awareness? “We will never know what would have happened if Bob Dylan had created his music in the 1940s, but we can assume that there were quite a few critical artists in the 1940s, too, only their music was received with less openness than was the case in the 1960s and 1970s. So the change occurred in the public, which was now receptive to their music. In the same way, today there is far greater openness toward rap songs that contain a great deal of verbal violence, which in the past simply would not have been acceptable. You can find violent songs in the 1940s and 1950s, too, but their public consisted of small, marginal groups.”
How do you account for the openness you describe?
“It has to do with the fact that the style of speech in the public space has undergone a change. Things that used to be spoken in private can be heard today in public, including in the political field, which has become far more outspoken. In addition, the internet and the social networks have made artists far less dependent on the big recording companies, which is also why they are a lot less limited in what they can say.”
Shamir offers an amusing example, in which the program he developed faced an interesting public test. “In 2016 there was a big scandal when some claimed the theme song of the 2022 Beijing Winter Olympics very much resembled the song ‘Let it Go’ [from the Disney movie “Frozen”],” he relates. “In the wake of this, the periodical Foreign Policy asked me to let the program analyze the song and see what it resembled.” To Shamir’s delight, the program did not let him down. “The computer immediately identified the great musical resemblance between the two songs,” he says with a satisfied smile.
In connection with computer-generated information-gathering, what leaps instantly to mind is the controversy surrounding the British firm Cambridge Analytica, which made headlines after revelations of its involvement in the 2016 U.S. election campaign. The company specialized in collecting data and analyzing information it had acquired from 50 million different Facebook profiles. With these it generated personality and behavioral models that enabled the firm to create personalized political campaigns. For every American user, a political slogan that played on his deepest fears; for every Facebook fan, an election broadcast adapted to personality, friends, hobbies, likes, desires, education, age and other information that was collected about him from his own account.
Although this whole phenomenon has often been described in terms of a danger to democracy – Shamir seems to treat it as small change, as the tip of the iceberg.
“What they did is dangerous and problematic, but it’s still nothing,” he says. “I’m talking about the ability to create a complete model of your medical condition and your probable life expectancy – from the pace of your typing, the time at which you open the computer in the morning, the frequency of your tweets, the number of Facebook friends, your age, the content of your emails, the pictures you upload to Instagram, the clips you put on Facebook, the number of hours you’re active at the computer.
“Today, we know that a person’s walking pace can be an indication of how many years he has left to live. Through the pupils of the eyes we can know which diseases a person suffers from. There have been cases of someone uploading a picture to the social network and a physician diagnosing, through the eyes, that he’s suffering from a certain disease. People wear all kinds of smart watches with sensors that record sensitive medical information, such as pulse and blood pressure.”
Sounds intense? Shamir doesn’t stop there: “The way you move the mouse can provide plenty of information about you. Even the frequency of your blinking can supply sensitive medical information about you. A human being cannot analyze so much information, he doesn’t have the tools to cope with more than a few dimensions at a time. But a program can be created that will allow you to analyze 4,000 different dimensions for every person, and at the same time cross-check the information with that of tens of millions of other people.”
Who has an interest in compiling a vast digitized database about blink frequency and the waking-up habits of millions of people?
“Think about what will happen when life insurance companies possess all that sensitive information and will be able to decide, on the basis of an evaluation of the lifespan of every person, whom to sell a policy to and how much to charge each client. And what about old-age homes, which obviously will prefer to accept people with a short life expectancy, who will pay a high initial price and vacate the room as quickly as possible, over those who will occupy a place for many years. Banks and financial institutions will be able to draw up a profile not only of your medical situation but also of your decision-making habits, and on that basis evaluate the prospect of your repaying a loan, or not. Employers will be able to decide on the basis of the profile whom to hire and whom to reject.”
To the bone
Shamir, 47, was born in Kiryat Ono and studied computer science at Tel Aviv University. He is married, with two daughters. His wife, Mirit, also works at Kansas State, as coordinator of the program for advanced degrees. In 2003, Shamir moved to the United States, earning his PhD in computational science and engineering at Michigan Technological University. His postdoctoral work was done at the National Institutes for Health in Baltimore, where he tried to develop logarithms for early detection of arthritic diseases in elderly people.
“Some arthritic diseases are incurable, but if they are caught early, it’s possible to delay significantly their rate of development by changing eating habits, losing weight and so on,” he notes. The problem, which crops up in most of his research studies, is not a lack of information per se, but the difficulty of analyzing all the information. In this case, there were thousands of x-ray images with very minor levels of deviation, which the human eye cannot detect.
“For decades, thousands of people came to the institute every few years to be x-rayed,” the professor relates. When 40 percent of them came down with arthritic diseases later in life, the institute staff went back to the early x-rays and fed them into Shamir’s program, with the aim of discovering a pattern that could indicate the existence of early symptoms. “What the computer found is that the disease starts with the bone and not necessarily with the cartilage, as was usually thought until then,” he explains now.
But why waste time talking about arthritis, when there are far more interesting subjects at hand, such as a comparative analysis between annual salary and the field performances of thousands of soccer players around the world?
“The truth is that I’m not really interested in soccer,” Shamir admits, adding that the study sprang from a master’s dissertation written in his department. He and his students drew on information collected by FIFA, the international soccer association, which included the detailed monitoring of 6,082 players from dozens of leagues worldwide (not including Israel, by the way).
The height of their leaps, speed, power, pass accuracy, ball losses, goals, assists, fouls, head-butting – all told, no fewer than 55 different criteria for each player in the 2016-2017 season were analyzed. I asked Shamir about the importance of a study like this. He admitted that if he’d had to invest a lot of time on it, he probably would have dropped it. “It’s a study that has entertainment significance. There is no exciting scientific discovery.” But a few seconds later, he adds, “On the other hand, billions of people around the world follow soccer, so who am I to say it’s not important enough?”
If the media constitute an index of any sort, the amount of coverage the study received globally attests to great interest in the sport. This may be due to the finding that drew the most attention: namely, that Lionel Messi, the Barcelona forward whom many consider to be the greatest soccer player of all time, is paid a salary disproportionately high in comparison to his performance on the field.
“Generally speaking, the study found that there is an overall compatibility between the players’ salary and their performance level – which was pretty much what we expected,” Shamir says.
In this connection, it’s important to note that the artificial intelligence he has developed, smart as it may be, is completely incapable of determining the proper salary for a soccer player as opposed to a schoolteacher or a bus driver. What it can do is a comparative analysis of field performance in relation to salary, and thus generate a new order in which salaries are calculated on the basis of each player’s level of performance.
Not surprisingly, by the way, even the objective computer program did not succeed in challenging the widely held proposition that ranks Messi as the player with the highest performance level in the world. The computer only concluded that his weekly salary of $550,000, at that time, was excessively high in relation to his performance as compared with other players. He should be earning a far more “modest” $243,000 a week, the program determined.
“According to one program, some players make a third of what he does but have a performance level that is 70 percent of his,” Shamir explains. Cristiano Ronaldo, the Juventus forward and Messi’s bitter rival for world’s best player, should have made “only” $154,000 a week, according to the program. At the time he was earning three times that amount.
“One explanation for the disproportionate salaries paid to stars like Messi and Ronaldo, is that their earnings are not related only to their performance on the field, but also to their star power, which generates income from broadcast rights, the purchase of season tickets, souvenirs and the like,” Shamir points out.
In addition to ranking the recipients of inflated salaries, the program also ranked those who were paid disproportionately low salaries in relation to their abilities.
Other findings of the study shed light on the different character of each of the senior leagues in Europe. For example, the program analyzed the criteria by which players in these leagues receive disproportionately high salaries. In Germany, for example, the top wages go to players with particularly sharp field vision. In Italy, which is considered a tough league, the top-salaried players were those with the most tackles to their credit, while in the virtuoso Spanish league goals from overhead kicks were ranked second among the performance criteria of the highest paid players.
Understanding Moby Dick
From Ronaldo to John Lennon, from Messi to Eminem, Shamir can squeeze a huge amount of data into the filter he has created, and break it up into a million small bits of useful information. Sports or music, in this context, are only showcases that time after time provide proof of the prodigious information analyzing capabilities he has created. And as he did with music, he is also doing with thousands of speeches by American politicians, both Democrat and Republican, from 2000 to 2008. The aim: to discover the speech patterns that characterized each party. “A study like this is not approached from a particular point of departure; you just pour everything into the computer and ask it to find the differences,” he says.
Even though he has not yet published the final conclusions of his intriguing research, Shamir emphasizes that already at this stage there are several blatant differences. “A major one is the tendency of Democratic speech-makers to use longer words and sentences. Another is that the Democrats’ speeches are far more negative than those of their Republican counterparts. The fact that the speeches were made during the George W. Bush presidency could explain the aggressive, militant approach of the Democratic Party at the time,” he says.
Alongside all the colorful studies, including one that compares works by some of the greatest painters in history and another that analyzes the work of Western film directors, Shamir takes special pride in one that has actually received less media attention: a comparison and analysis of the language of whales in different places in the world.
“It’s a fascinating study of considerable scientific importance,” he says enthusiastically about the research, which he published in 2014. “Today there are ships that travel with special equipment and record thousands of hours of communication among whales, the question being how they are differentiated from one another.” That question was left for the computer to answer, of course.
“You pour the thousands of hours of audio into the computer without any instruction and let it build a voice map of the different whales by itself,” he explains, showing on the computer the musical tree the device created. At one end of the chart are killer whales off the Norwegian coast, and at the other end a group of killer whales off Iceland. At the bottom of the chart, also separated by the computer, is a pod of pilot whales near the Bahamas and another pod of the same species near the shores of Norway.
“What is new,” says Shamir, “is the amazing way the computer was able to distinguish between the same whale species from two different regions of the world. We can infer from this that whales have different accents according to the region they live in, like the differences in people’s accents.”
Shamir has no plans at present to return to Israel or to leave the academic world – at least not until he completes what he calls the “project of my life.” This is an ambitious, breakthrough, slightly megalomaniacal undertaking aimed at deciphering the secrets of the universe and creating an astronomical map that will redefine everything we think we know about the millions of different galaxies.
“The romantic image of the astronomer standing alone on a hilltop and looking at the sky through a telescope is a thing of the past,” he says. Today’s astronomy, he notes, is based on robotic telescopes capable of collecting information that not even hundreds of scientists devoting all their time to the project would be able to cope with.
“The Large Synoptic Survey Telescope (LSST), which is scheduled to be fully operational in 2022, is expected to collect in three days the amount of information that today’s large robotic telescopes take several years to collect. This telescope will set us new challenges and create opportunities to analyze information and to discover quite a few astronomical objects we hadn’t known existed. The analytics will allow us to use the vast amounts of information to see the universe in an entirely new way.”
Once Shamir starts talking about astronomy, it’s hard to stop him. “With the help of analytics and information mining we will be able to find rare objects among many millions of astronomical bodies,” he continues. “These objects will be able to provide us with new knowledge about the past, present and future of the universe. We know that galaxies can differ in form, and one of the basic questions is how this happens and what causes it.”
One way to answer this question, according to Shamir, is by conducting a computerized analysis of the formation of the galaxies. “Among other developments, we expect that the information the new telescope will provide will help us answer meaningful questions concerning the existence of dark matter and dark energy. The next 10 years are set to be one of the most fascinating periods in the history of astronomy.”