Artificial intelligence is everywhere, from autonomous cars made by Google and Tesla to Siri the virtual assistant. The ability to call a taxi using voice commands is AI. So is the IBM Watson computer system that, since vanquishing humans on the game show Jeopardy, has become an integral part of medical projects to treat cancer and a number of other applications – abilities that are impressive, but technical in substance.
Now the tech giants want to give AI a whole new dimension: replacing humans in jobs that have long been considered unmachinable, such as writing music.
Sony’s Computer Science Lab has already managed to produce songs with distinct musical styles. One draws inspiration from the Beatles, while another combines the likes of Irving Berlin, George Gershwin and Duke Ellington. The results are not perfect, but in this case, it’s the route that matters.
How can a software program be inspired? The answer lies in artificial neural networks that enable learning. These are algorithms through which researchers try to emulate the activity of the human brain.
Learning is done by example, and lots of it. The artificial neural networks are based on multiple processing levels. The first ones identify specific, predefined elements in inputted information. The more processing levels are added, the better the algorithm learns to identify and classify whole objects, such as knowing when a picture shows a dog or a specific face.
Learning computers are not new. Artificial neural networks have been around since the 1940s, but their potential remained remote. In 1956, computer scientist Arthur Samuel wrote a program that could teach itself to play checkers, and the field has come a long way since then.
The last 15 years, especially the last five, have been characterized by “deep learning” – a technology with increasingly large processing layers of neurons. A big leap forward was made in 2012, when Geoffrey Hinton, Alex Krizhevsky, and Ilya Sutskever won the ImageNet Large-Scale Visual Recognition Challenge, in which the software was required to accurately classify and organize 1.5 million pictures.
Among the things hampering development in previous decades was limitations of computing power, as well as the ability to collect and handle the necessary quantities of samples. But the solution evolved as computing power increased in recent years, allowing for vast amounts of data – and the ability to analyze it, says Dr. Jonathan Laserson, algorithms expert at Zebra Medical Vision, an Israeli medical imaging company using machine learning.
The development of graphic cards that excel at parallel processing (which splits the computational task handling several cores in parallel) made it possible to speed up the calculations about 30-fold.
The Sony musical algorithm relies on 13,000 pages of notes from which it derives information, which it then applies to new items it’s exposed to.
Google also tried its hand at writing music, though unlike Sony, it didn’t rely just on written material, but also tapped existing songs.
Google claims that its WaveNet software sets a new standard in text-to-speech software and in its ability to mimic the human voice. It is familiar to users from the company’s automatic translation software services. Google can use the technology to add sounds such as piano plinks, and to write music without human intervention.
“Companies are starting to use deep neural networks that give them abilities they didn’t have before,” says Eyal Gross, a machine learning expert and digital artist. “What they have recently achieved with WaveNet is to work with raw audio data that hasn’t undergone any preliminary analysis,” says Gross. The data is of a very high resolution and requires sampling at hundreds of thousands of bits a second. The necessary calculations are vast and the result is commensurately impressive: It simulates real sound.
“Things like music, video and speech are series in time. Unlike pictures, they depend on a historic sequence,” says Gross. And music can be a lot of channels in parallel, each with its own sequence in time, but dependent on the history of the others.
Last June, the Google Brain team launched the Magenta project, to try to answer the question, “Can we use machine learning to create compelling art and music?” Douglas Eck, a research scientist at Google, explains that one goal is to advance machine learning and the other is to create a community of artists, programmers and machine learning researchers to use the company’s TensorFlow software.
In the months since the launch, Eck and MIT Media Lab’s Natasha Jaques have been working on development using enhanced learning, which is fancy talk for using the WaveNet algorithm with basic principles of music writing. So while the algorithm learns, humans help with some basic directives to avoid low level mistakes like playing too slowly, repetition of sounds and so on. The website Technology Review clams the new system gets positive marks every time it generates a continuum of notes that is like patterns it noticed in the songs fed to it, but also obeys the rules of music it’s been taught. The rules are simple ones taken from books on songwriting. Combining machine learning with the rules of music writing and thousands of songs written by people results in catchy tunes that hit the spot, says Eck.
Should songwriters throw in the towel? Bye bye Bob Dylan? Are we going to be seeing creative music machines listed on eBay? No, says Gross: Machines are just at the start of trying to make something that sounds like human music.
In his view, these technologies are tools, and when using tools, there are plenty of artistic choices. For instance, he points out, all the methods of the neural networks and machine learning are fed by examples, and the choice of examples determines the character of the result.
The Sony system doesn’t operate in a void either. A composer named Benoit Carré chose the musical style, wrote the text, helped produce and was actually involved at almost every stage of creating the song. The neural network produces a lot of results, but ultimately, it is a person who chooses and curates the best, relevant ones, says Gross. Borrowing from Alon Amit, who wrote music generating software in the 1990s, it’s like the work of a photographer. He didn’t build the city or plant the flower but he’s the one who decides what to shoot and at what angle. That is an artistic choice – and here, man still wins over machine.