Israeli Scientists Use AI to Reconstruct Broken Babylonian Tablets

Gaps in ancient text are so frustrating. Now artificial intelligence can be trained to plausibly restore missing cuneiform signs in ancient Babylonian texts, Israeli historian demonstrates

Send in e-mailSend in e-mail
Broken Babylonian tablet telling the tale of flood of Gilgamesh / Noah and the Ark
Broken Babylonian tablet telling the tale of flood of Gilgamesh / Noah and the ArkCredit: The Trustees of the British Museum

Understanding texts written using an unknown system in a tongue that’s been dead for thousands of years is quite the challenge. Reconstructing missing bits of the ancient text is even harder – though admittedly, if one gets it wrong, who’s to know?

Filling in missing text starts with being able to read and understand the original text. That requires much donkey work. Now an Israeli team led by Shai Gordin at Ariel University in the West Bank has reinvented the donkey in digital form, harnessing artificial intelligence to help complete fragmented Akkadian cuneiform tablets.

Their paper, “Restoration of Fragmentary Babylonian Texts Using Recurrent Neural Networks,” was published in the Proceedings of the National Academy of Sciences in September.

“Neural networks” sounds like B-horror movie fare, but it means software inspired by biological nervous systems. The concept dates back more than 70 years, and has gone in and out of fashion. Now it’s back, and as usual the base concept is to teach machines to learn, think and make decisions. In this case, the computer decides on the plausible completion of missing text.

No, the digital donkey can’t read cuneiform. Computers can’t read handwritten characters. We animals are superb at recognizing letters and numbers written differently by different people. Machines are terrible at it. Gordin and the team feed their machine transliterations of the extant Babylonian texts i.e., what the text would have sounded like.

>> Never miss an archaeology story. Sign up for free to our weekly newsletter

Then what? When it comes to missing bits in a papyrus or tablet, humans can intuit that “…ow is your moth…” isn’t a query into the well-being of your mothball. With machines, it’s all about mathematics and probabilities based on knowledge gained so far.

A good starting point to deciphering ancient writing is to examine how it all began.

A eureka moment in Uruk

Sometime over 10,000 years ago, the Ice Age waned and inhabitants of the Near East began to settle down, grow and husband food, and become territorial. Well before the advent of the tamed quadruped, let alone the wheel, they were trading.

It may have been trading that inspired the earliest recognized form of communication: “pseudo-writing” on small bits of clay in Mesopotamia around 7,000 years ago. The clay bits, called tokens, were shaped into simplistic imagery such as a cow or other ancient commodities. These were later imprinted on clay bullae – slightly larger pieces of clay that functioned like envelopes. Then we start seeing abstract signs; repetitive strokes or depressions are interpreted as numbers (price, perhaps); and possibly also personal names, using the first sounds of different imprints to put together words you can’t draw.

Tag (token) from the Indus ValleyCredit: Ismoon

Britannica brings the example of the Sumerian word for hand, šu, used to represent the sound “šu.” These tokens are basically tags, which represent the earliest known administrative use of markings.

We don’t know who invented the price tag, but Egypt and Sumer were the major polities that could amass enough human and material resources at the time to require complex administration, Gordin points out. “They tagged stuff like today in the mall. The tag shows a drawing of what it is. You don’t need writing to do that,” he says.

(There is an outlier theory that copper figurines found in Israel dating to about 6,500 years ago were a form of visual code, making them proto-writing.)

Anyway, after pseudo-writing came proto-writing: figurative proto-cuneiform inscribed on tablets, which arose about 5,500 years ago in the city of Uruk i.e., that’s the only place such tablets have been found so far. “Somebody there had a eureka moment,” Gordin quips. 

Proto-Elamite token, Uruk period, found in the Susa acropolisCredit: Marie-Lan Nguyen

Within mere centuries, proto-cuneiform evolved to become increasingly schematic and Sumer was apparently where it happened, he says. And figurative hieroglyphic script began to appear in ancient Egypt at about the same time, about 5,000 years ago.

It isn’t clear which came first, Gordin says. It’s plausible that proto-cuneiform and proto-hieroglyphics arose contemporaneously and independently, in a sort of convergent evolution. It can be said that the two writing forms influenced each other, he adds.

And these begat cuneiform: “wedge-shaped,” logo-syllabic script made by impressing the sharpened point of a reed into moist clay.

Bewildered mercenaries in Canaan

One obstacle to interpreting tablets is that “cuneiform” isn’t one language. It became the writing system for cultures all around the Levant and Mediterranean, speaking unrelated languages, from over 5,000 to just 2,000 years ago.

“A cuneiform tablet inscribed in third millennium Uruk, Iraq, wasn’t written in the same language as a tablet in second millennium Kültepe, Turkey,” Gordin says, driving home the point. Given their intense trading and other relations, there was sense in the region’s peoples adopting the same basic writing system, he points out. It did, however, make Akkadian the lingua franca of the Near East from about 3,800 years ago.

In the Levant, from about 3,800 to 3,500 years ago, Canaanites also used cuneiform, as well as what seems to have been the earliest alphabetic script in the world, Gordin says.

They did? In fact, the Canaanites may have invented alphabetic writing. Traces of early alphabetic signs were found in Wadi el-Hol in Egypt’s western desert. And in Sinai, an alphabetic writing system called proto-Sinaitic was found that may go back as long as 3,850 years ago. The inventors were Semitics, plausibly Canaanite, workers and/or soldiers in the Egyptian army.

“Canaanites went back and forth all the time. They could always find work in ancient Egypt,” Gordin explains. And likely they couldn't, or didn’t need to, get the hang of the complex hieroglyphic writing, so they took some hieroglyphic signs and came up with a simpler method.

Why do we think Semitics invented the alphabet, not Egyptians? “When we find interaction between the two writings, the values of the alphabetic writing are Semitic, not Egyptian,” Gordin says. For instance, references to gods in proto-Sinaitic are Semitic ones.

Proto-Sinaitic Credit: Kwamikagami
Brought to you by the letter A for 'alpCredit: Rozemarijn vanL

Thus, cuneiform came to the Levant from the north and the proto-alphabet from the south. And the early alphabetic systems from Egypt spread into Israel, where they are known as proto-Canaanite. With the help of sea traders from Phoenicia, the consonantal alphabet was introduced throughout the Mediterranean region and north into Asia Minor.

Gordin adds that an alphabetic writing form briefly appeared out of nowhere in southern Mesopotamia, at the marshy area called the sealand, some time between 3,800 to 3,200 years ago. We can’t really read it.

Hail to the Great King

By the time cuneiform became a thing, writing had passed the stage of “Sheep : four : Yerachmiel” and reached the stage of official records, letters and formulaic recounts of the wondrousness of the ruler. Ultimately, we owe its interpretation to Napoleon Bonaparte’s unusual method of motivating his troops. After landing in Alexandria to conquer Egypt in 1798, the emperor literally burned his boats so lily-livered soldiers couldn’t retreat. While stuck in Egypt anyway, engineers and other professionals with the army documented ancient Egyptian culture. The Rosetta Stone was found in July 1799.

This was a stele engraved with a priestly decree deifying Egypt’s King Ptolemy V Epiphanes in 196 B.C.E., in ancient Egyptian hieroglyphic, demotic writing (a cursive form of hieroglyphics) – and ancient Greek. It was just the first of many multilingual inscriptions. For cuneiform, we have the gargantuan multilingual text at Behistun, Iran. Darius the Great had his exploits described in three different cuneiform scripts: Old Persian, Elamite and Akkadian. Crucially, copies of the text were also found in Aramaic and ancient Greek.

This Behistun text was monumental: 15 meters (49 feet) high by 25 meters wide, and 100 meters up a cliff on the road connecting Babylon and Ecbatana, all to describe how Darius vanquished Gaumata and other foes.

Darius I the Great's inscription at BehistunCredit: Aryobarzan
The highly inaccessible but monumental inscription of Darius the GreatCredit: علم‌جو

“The texts would all start with Darius the Great King,” Gordin explains. Thus, the laborious process of interpretation began with recognizing that a given set of markings spelled out the king’s name: it was written in syllables. Syllabic writing has dozens of possible symbols in cuneiform, Gordin notes.

Delving into dead languages needs a lot of raw material. Cuneiform had become quite the Western craze as of the 19th century, and vast amounts were found – not least in Nineveh. “More than 10 million words are attested on some 600,000 inscribed clay tablets and hundreds of monumental inscriptions,” the paper says. “After discovering Nineveh, knowledge exploded,” Gordin says.

And over decades, linguists slowly interpreted the languages of Babylon and Assyria, thanks to Darius’ monumental ego. It took time to realize that some marks were syllables and some were whole words; that a sun symbol would refer to the sun, a god or a sound like (their version of) “sun.”

Asked about some claims that the Indus Valley writing was first, Gordin points to proof of trade in the third millennium B.C.E. between ancient Sumer and the Indian subcontinent. They too had tags. “Many of the Indus’ ancient text looks like tags. Same world, same reason probably – nobody invented writing to boast about Gilgamesh’s achievements. They wanted to make order in their sheep and dried fish and barley. People are pragmatic.”

Language and the machine

Interpreting a dead language is a mathematical game, Gordin says, promptly killing the story ... no, let’s go there. Neural networks are a computerized model that can understand text. How? They turn each symbol or word into a number, he explains.

The different elements of this electronic neural network communicate between themselves in numbers and create a model of the language.

When humans reconstruct missing text, their interpretation may be subjective. To be human is to err with bias, and quantifying the likely accuracy of the completion is impossible. Enter the machine.

The team chose to seek proof of concept with Late Babylonian texts from the Achaemenid period because there are so many of them, and they’re very formulaic. The model was “trained” on around 2,000 of these texts, and was then asked to complete sentences it didn’t know before.

The machine proved capable of identifying sentence structures – and did better than expected in making semantic identifications on the basis of context-based statistical inference, Gordin says.

Its talents were further deduced by designing a completion test, in which the machine-learning model had to answer a multiple-choice question: which word fits in the blank space of a given sentence. The type of correct answers or the order of the choices in the test actually taught us more about the model than its mistakes.

Can it work alone? No. Nor will a burro do much if you don’t lead it. But the team concludes that artificial intelligence can save linguists and archaeologists restoring fragmentary texts a lot of donkey work.

Canaanite cuneiform, Tel HadidCredit: Collection of the Israel Antiquities Authority