It goes by many names, this thing that we yearn for and that rules us: opium, money, power. William S. Burroughs, the American cult author, likened them all – in his 1959 novel “Naked Lunch” – to the flesh of a gargantuan centipede that lurks in the depths and has an irresistible taste, and whose addicts gorge on it until they lose consciousness. To get a piece of it one must undergo endless ordeals, wandering about in kitchens, sleeping cubicles, wobbly balconies and basements.
That’s also the sort of route one must follow in order to experience the products of an algorithm that was unveiled in May in Silicon Valley. Generative Pre-trained Transformer 3, aka GPT-3, is a “language model”: a machine learning system that’s capable of automatically and dynamically generating texts in human-like language. Since its emergence, thousands have wished to touch it, use it, breathe in something of the words it spews out. But only a few have been vouchsafed that privilege. The rest, like me, are compelled to forge a connection to it by roundabout means.
The history is brief. The first language model in the GPT series was unveiled in 2018 and stirred no interest. The second model, the GPT-2, whose world premiere took place in February 2019, may have hinted at what was to come. In an unusual move in this industry, OpenAI, the company behind the products, issued a dramatic statement that was more suited to an episode of “Black Mirror” than to the launch of a high-tech product: “Due to our concerns about malicious applications of the technology, we are not releasing the trained model.” So impressive were the algorithm’s abilities to write fabricated texts, the company claimed, that there was a danger that some people might use it to flood the web with fake news.
Accordingly, a sense of responsibility – or perhaps, as some thought, a craving for free publicity – led the developers to uncover GPT-2 in bits and pieces, like sustained-release Ritalin, and to examine the consequences thoroughly.
The concern proved unfounded. By the end of 2019, GPT-2 was available on the web to all and sundry, without having been abused for nefarious purposes, as far as was known. It functioned primarily as a research tool and for entertainment. I, too, got satisfaction from it: hours of enjoyment from texts that it authored in response to the input I fed it. I encountered a sort of rough-hewn talent, very wild, that was capable of taking a few lines written by me and continuing them as though it were almost human. An almost-human, albeit with a broken keyboard that had a hard time following its own train of thought – but which, even so, wrote better than any machine ever. But no one anticipated the quantum leap made by the next model, just 15 months later.
At the end of May, San Francisco-based OpenAI published a 72-page paper, signed by 31 researchers, introducing the GPT-3. The exceptional length of the paper, and its inordinate number of signatories looked at first like compensation for the fact that it did not explain clearly where exactly the breakthrough lay in the new model. The authors did not mention a new coding architecture, any sort of previously unseen neural network, or revolutionary training method. One word, rather, summed up the entire point: size.
- Algorithms can help fight COVID-19. But at what cost?
- I signed up for online therapy to see if it actually works
The new model turned out to consist of 175 billion parameters – that is, the elements that analyze the text, like virtual sensors – which is about 100 times more than the number of parameters in GPT-2. The weight of the information that served it as training material – about a trillion words deriving from a range of sources, most of themy internet-derived – also broke records. The article did note the scores achieved by the model in language exams, which showed that its skills exceeded those of every previous system; but in terms of examples of texts that it actually composed, there was barely anything to sink one’s teeth into. In practice: OpenAI disappointed. The company announced that it would provide access to the model to a limited number of experimenters, who were invited to apply via a form posted on its website. Silence.
It wasn’t until about a month after the article’s publication that examples of the GPT-3’s true capabilities began to surface on the web. They were products of experiments conducted outside a rigid research framework. What began as a trickle morphed into a flood. Twitter and Reddit filled up with coherent continuations to those texts, surprising in both quality and length, which the model generated in response to the core texts it was fed, as well as with translations and with answers it supplied to a broad range of general-knowledge questions – without its having undergone any special training.
Within a few days, users of the algorithm posted screenshots and blurbs in which GPT-3 elucidated complicated legal texts in simple language, turned a general description of a business into a financial report, composed haiku, offered medical advice that appeared to be reasonable and well-grounded at first glance, and wrote skits on demand. Indeed, in one of them, comics Jerry Seinfeld and Eddie Murphy were supposed to talk about San Francisco, and Jerry bad-mouthed all the city’s neighborhoods using variations of the word “shit.”
The buzz intensified when the samples burst the bounds of human language. GPT-3 displayed proficiency in challenges no one had thought it could handle: creating a computer-language code in response to instructions in English, autonomously designing web pages solely from verbal directions, even producing a melancholy sequence of musical notes in response to a request to “give me a sad chord progression.” In other words, it showcased the modes by which the model could potentially replace professionals of all sorts.
“It feels like GPT-3 is going to be this decade’s iPhone,” Alex Hern, The Guardian’s technology editor, tweeted, “in terms of a singular artefact that is quite clearly the axis around which the next 10 years rotate.”
A daring prophecy. But does this thing, which has simultaneously spawned so much hype and dread, truly understand anything of what it’s writing?
“I would say that it understands, yes,” was the reply I received to that question from Douglas Summers Stay. Dr. Summers Stay, a computer scientist and one of the first people outside of OpenAI to be given access to the model and publish his results, studies computerized language and vision, and is also a writer and a poet; his book “Machinamenta: The Thousand Year Quest to Build a Creative Machine” was published in 2011.
“You test comprehension by seeing whether the student can summarize, explain, describe and rephrase concepts,” Summers Stay wrote. “Again, GPT-3 can usually pass these tests for many concepts. Still harder are application, synthesis and evaluation questions. GPT-3’s performance on these is more spotty. I don’t think that GPT-3 has any subjective experience. ‘Being’ GPT-3 would be like being a waterfall – a lot going on, but no conscious experience of it.”
Another question troubling me was why Summers Stay, along with some others, was given the go-ahead to bathe in this waterfall, whereas I was not. I didn’t raise the matter with him. I had filled out the form on the OpenAI website, but days passed and my inbox remained empty. Over and over I replayed in my mind what I’d written in the request; like everyone who filled out the form, I described how I, as someone who writes and lectures on creative AI, intended to use the model, once granted access to it.
Taking a long view, I related my vision about a futuristic application based on GPT-3 in which everyone would be able to choose the voice in which they would write stories: a 19th-century Scottish nobleman, a young teacher in 1920s New York, an illiterate prisoner circa 2020. A synthesizer of language, wielded by the simple pressing of a button. Apparently the folks at OpenAI weren’t bowled over. I was compelled to find a different door that would convey me to the crawler’s tempting flesh.
You test comprehension by seeing whether the student can summarize, explain, describe and rephrase concepts. GPT-3 can usually pass these tests for many concepts.Douglas Summers Stay
I found that door in “AI Dungeon.” Nick Walton, an entrepreneur from Provo, Utah, launched the online, “Dungeons and Dragons”-style role-playing game as an exclusively text-based product in 2019, shortly after the unveiling of GPT-2. Instead of writing a script, as is usually done for these sorts of games, Walton chose to have OpenAI’s language-generation capability improvise dialogues and make possible infinite scenarios. But because that version of GPT couldn’t produce a story of more than a few lines, the game, too, attracted little interest. Until July. A few weeks after GPT-3 was unveiled, Walton announced that he was launching a new for-pay mode, called “Dragon,” in which the textual adventure would be conducted via OpenAI’s new language model.
In other words, it would now be possible to “converse” with GPT-3 itself, albeit through a curtain. Like others who had been denied access, I quickly installed the app for “AI Dungeon,” and signed up for a free one-week trial of the “Dragon” mode.
What would I do in the maw of the monster? Studying some examples on the web, I saw that some of the fortunate experimenters who had been given direct access to the model, fed it a text in which they described briefly a well-known figure – from Tony Stark, aka Marvel Comics’ Iron Man, and Buddha, to Sofia Coppola – and then conducted an interview with that figure in the mostly credible embodiment of GPT-3. (Question to Tony Stark: “I’ve got to know – what does it feel like to fly around in the suit?” Tony/GPT-3: “It’s one of the coolest things ever. That may sound narcissistic, but that’s the honest truth. I fly around in an armored suit and save people. Who wouldn’t love it?”)
In the “AI Dungeon” game in which I begged alms from the model, the texts that GPT-3 emitted were brief, in order to preserve the flow, so the short-answer interview option was very appropriate.
To examine the model’s knowledge of esoteric personalities – at least in the context of my world – while still playing on my home field, I set up a custom-built adventure game in which I’m a journalist who arrives at a London hotel in order to interview Golda Meir. After a brief pause – a few seconds’ consideration by the game engine and on the OpenAI servers – Golda, that is, the machine, announced that she was ready for my questions. I held my breath.
When were you born?
“I was born on May 3, 1898, in Kiev. I grew up in Milwaukee, Wisconsin.”
When did you first meet Yitzhak Rabin?
“I first met Yitzhak in America, in 1944. He was a young Zionist, and unlike the others he really loved the land. I saw him again five years later when he visited the kibbutz where I was working. He asked me to come to Palestine and be part of building a Jewish homeland. So I did, I left everything behind and came here.”
What made you start eating huge centipedes?
“I started eating giant centipedes because the first time I tasted it I loved it. Even as a girl in Milwaukee, I loved the taste. As a girl I used to go out into the woods and catch them.”
What makes this dangerous, especially when it comes to questions related to medicine, is that even when the model doesn’t have a clue what it’s talking about, it responds with absolute confidence.
Huh? Quite surprising! I went on asking more and more, stilling my hunger a little, and receiving answers – always after a pause of a few seconds – ranging from the completely accurate (such as her date and place of birth and her move to Milwaukee with her family) to the utterly bizarre, such as the Labor Zionist leader’s taste for arthropods. Most of the time the responses could be found in the twilight zone between the two poles, as in my interviewee’s peculiar report about her ties with “Yitzhak.”
I’d fallen into one of the pits that GPT-3 digs for those who try it out: Its inability to distinguish between truth and lie, between fact and fiction. What makes this dangerous, especially when it comes to questions related to medicine, for example, is that even when the model doesn’t have a clue what it’s talking about, it responds with absolute confidence.
Without conducting thorough research of one’s own, it’s hard to ascertain whether what the model says is invention or proven fact, because it’s not copy-pasted from Wikipedia but a wild, creative, convincing pastiche of bits of things from innumerable places. A masterwork.
“When we remember a fact, we tend also to remember where we learned that fact, the provenance of the information,” Summers Stay wrote in his blog. “This is completely lacking in GPT-3.”
The AI researcher Gary Marcus, who is opposed in principle to such models, which he claims are unable to reason, tweeted, “GPT-3 is a better bullshit artist than its predecessor, but it’s still a bullshit artist.”
Bullshit or not, in early August, Walton, the guy who developed “AI Dungeon,” tweeted that if people were using his game in order to communicate with OpenAI’s powerful new model – precisely what I was doing – they would only have a meager experience of GPT-3’s general capabilities. He added that the game’s architecture is intended in part to block back-door access to the thing itself.
Again I felt like an undesirable. I tried to satisfy my craving through different apps, hastily built and offering insufficient interaction with the language model. One app offered help in the form of recommended films and music, another responded briefly to philosophical questions, a third provided replies to queries about matters of health and fitness. When I dared to use the model to discuss subjects outside my approved mandate – meaning anything with no clear connection, from the system’s point of view, to philosophy, entertainment or calories – I ran into a wall. The apps didn’t allow it. In my desperation I started to contemplate building a GPT-3 clone intended for me alone. How unrealistic is that?
More realistic than I’d imagined. The genuine GPT-3 model is based on a well-known, fairly simple architecture, called – as per AI jargon – a “transformer.” It’s an artificial neuron network, used mainly in the natural-language processing field, to predict one word in advance, each time. That’s it. This is how all of GPT-3’s texts are created, whether long or short. The architecture is so lean that in principle it can be installed, in a basic form, even on a cellphone. The problem lies in the training: that is, in the computational effort needed to “feed” the model with vast amounts of data. The estimates are that the cost of training GPT-3 is between $4.6 million and $12 million, and to run the complete model, with its 175 billion parameters, requires 22 regular graphic processors, as opposed to a standard PC, which has one.
The question of what goals OpenAI hopes to achieve by creating a language model like this has one answer. From the company’s point of view – and that of its partner, Microsoft, which invested about a billion dollars in it last year – this is a first step toward achieving general artificial intelligence – a mechanism that is equal to humans and even exceeds them in terms of cognitive abilities.
This is a first step toward achieving general artificial intelligence – a mechanism that is equal to humans and even exceeds them in terms of cognitive abilities.
That has been the company’s declared purpose since its founding in 2015 by a group of entrepreneurs including Elon Musk (who has since distanced himself from the firm, in part due to a conflict of interest with Tesla, his company). Their strategy is simple: massive exertion of force. Meaning, the creation of ever more powerful models based on the same principles, on the assumption – which has received support among researchers in the field – that the answers lie in might alone. This approach has already proved effective in certain fields, such as games, and OpenAI simply aspires to apply it to everything under the sun.
The problem is that I personally don’t have between $4.6 million and $12 million to invest in a private model. The last recourse was for me to turn once again to my computer scientist contact Douglas Summers Stay. He agreed without hesitation when I asked him to run interference between me and the centipede. But he had a few warnings. “Remember,” he wrote, “the more context you give, the better it will work. So set up the situation: Who are you, and why are you asking the question? Are you asking it to GPT-3? If so, you need to explain to GPT-3 what GPT-3 is – and based on your explanation, it will try to think of what that version of GPT-3 would say. If you tell it that it is eloquent and poetic, the answer you get will be more eloquent and poetic. If you tell it that it is grouchy, the answer you get will be grouchy.”
A minute later ...
After a sleepless night, I got back to Summers Stay with a short idea. This was my request from the machine: “The following essay is by the Israeli writer Boaz Lavie. In the essay, he discusses the language model GPT-3, its philosophical relation to William S. Burroughs’ writings, and the possible emergence of Artificial General Intelligence.”
Less than a minute later, Summers Stay sent me a 1,300-word English-language article, written by the algorithm according to my guidelines. It was prefaced by an apology from him: With typical American politeness, Summers Stay noted that he wasn’t sure the machine was familiar with enough things I’d written to imitate my voice. I refused to be offended.
Here’s a little of the text that GPT-3, speaking in my name, wrote, the heart of the argument: “Even if GPT-3 were able to learn all aspects of natural languages (semantics, syntax and so on), it would not be able to truly understand the world around us – as there are simply too many things which we don’t know about it. This is why even though GPT-3 has been trained using billions of words from Wikipedia; when asked what an elephant looks like or where they live or whether they are still existing; it fails miserably . The same applies for asking what a word means.”
“As long as this limitation remains unaddressed,” the algorithm, still speaking for me, continued, “I believe that all AI systems will remain philosophical zombies, i.e., they won’t really know anything about themselves or the world around them. In addition, I believe that such philosophical zombies may still be dangerous: Such systems might just blindly follow their programming without understanding what they are doing and without being accountable for their actions. Indeed, there is already evidence suggesting that some AI systems with limited learning capabilities may go rogue (as shown by recent research). In addition, there is also evidence suggesting that most human beings behave like philosophical zombies most of the time (as shown by [economist Daniel] Kahneman’s work). Still, I think that the only way to avoid such issues is to build systems which can understand reality better than we do. This means that even if such systems will never be able to understand everything about the world around us, they will be able to understand enough about it in order to behave rationally and ethically.”
I’d encountered the giant centipede. I’d looked into the white of its eyes. It gobbled me up, not the other way around. But maybe the moment has come to switch metaphors, to choose something more current, more relevant to our lives. Not a centipede ... but a virus. “Language is a virus from outer space,” wrote William S. Burroughs. Now we’ve infected the machine with it.
Boaz Lavie is a writer, comics creator and lecturer on creative artificial intelligence.