Revolutionary Computer Can Decipher Ancient Hebrew Texts in Seconds

Haifa University uses high-tech to decipher ancient historical, Biblical and literary manuscripts

A University of Haifa researcher holding a scrap of ancient Hebrew script.
Amir Levy

“Hasten to the Shoko,” urged the computer. “The mouth asked to smoke,” it mused another time. Then it declared, “Jesus God to rejoice.” The cryptic phrases brought both smiles and satisfaction to the managers of the digital humanities laboratory at the University of Haifa. One is a Talmud and Midrash teacher and the other a professor of information systems.

The platform, called Kraken, is taking its tentative first steps in attempting to decipher ancient Hebrew. The hope is that in the not-too-distant future, after completing its studies, Kraken will be able to read any Hebrew text, even if the manuscript is distorted, illegible or hard to decipher. It’s part of a discipline called digital humanities, which uses advanced technology to enhance studies in history, the Bible and literature.

Haaretz Weekly Ep. 56Haaretz

Like children encountering Hebrew religious texts in elementary school for the first time, Kraken also needs practice to become familiar with the material. The “shoko” was supposed to be “shoket” – trough. The mouth wanted to “deal with the Torah,” not to smoke, while Jesus, heaven forbid, has nothing to do with the third phrase, which was originally “the Lord will again rejoice.”

Moshe Lavee, a Military Intelligence veteran, a senior lecturer in Talmud and Midrash in the university’s department of Jewish history in the University of Haifa. He is the director and founder of eLijah-Lab, Kraken’s home and one of the two researchers heading the lab. This week he spoke with contagious enthusiasm about the digital revolution, which is destined to save several research fields from oblivion. He uttered expressions that appear light years away from his “antiquated” research areas: “deep learning,” “remote viewing,” computer vision,” “data mining” and “artificial neural networks.”

On a monitor he showed a scanned section from Midrash Tanhuma, three collections of Pentateuch aggadot (homilies) from the end of the ancient history. The script is difficult to read, but the computer doesn’t give up. Kraken – developed by Prof. Daniel Stoekel Ben-Ezra of Ecole Pratique des Hautes Etudes in Paris – succeeds in reading it, and later presents it to the researcher as a simple text file.

Moshe Lavee (right) with a visitor at eLijah-Lab.
Amir Levy

This opens new research possibilities that ignite the imagination, first and foremost searching and analyzing information in large scopes and kinds of texts that until now even the most skilled researcher couldn’t carry out alone.

“Our vision is to make all the Hebraic scripts accessible,” says Lavee. “We’ll turn Jewish and Hebraic legacy into texts accessible to computer search and study and save a huge treasure trove of knowledge and Jewish traditions.”

To better understand what he’s talking about, one must know what technological changes the manuscript world has undergone in recent years. In the past ancient Hebraic texts were available only in books. To study them researchers had to sit in libraries with piles of thick tomes. In recent years these manuscripts are undergoing digitization, they are scanned and uploaded as picture files to computers throughout the world. The next stage, which the digital humanities lab is now concentrating on, is turning these files from pictures – which only the human eye can read the letters on – into text files, the kind the computer can read.

The revolution was enabled by Handwritten Text Recognition technology, which enables a computer to read tens of thousands of pages – like novels and poetry from the 19th century, diaries and letters from WWII and ancient philosophical and religious texts, including intelligible handwritten input.

Lavee says “the computer is taught to read the texts automatically, based on practice, so it acquires contextual knowledge about the language and uses it to reach better results.”

Prof. Tsvi Kuflik, his partner in running the laboratory, says “the days in which a researcher spends hours reading piles of books in dusty libraries are about to end.”

He says “technology can spare the researchers the Sisyphean search, enabling them to use their time more efficiently.”

At this stage the computer still needs the researchers’ help. They are teaching it to read and “understand” the ancient Hebraic texts it encounters for the first time. “We show the computer many pictures from manuscripts, alongside their correct transcription,” says Lavee.

“The computer itself finds the leading mathematical formula from the visual data for the text, and develops the ability to decipher even the written manuscript, whose transcript it hasn’t been shown before.”

Dror Elovits, the lab’s technology manager and a graduate student in history, believes “the day is not far when we won’t need the human factor anymore, the texts will digitize themselves.”

Elovits, who worked for a decade in Israeli startups, decided a few years ago to take a break and began studying history at the university. In a lecture about the Roman empire, the lecturer exposed him to the developing world of digital humanities and showed him how to combine the two worlds – the one he left behind in the glittering high-tech companies and the academic one in the old corridors of the humanities faculty.

Digital humanities, he says, is still seen by the public and academia as “an alien, unknown, threatening discipline.” But in recent years it’s growing closer to the academic mainstream and becoming, in his words, “an inseparable part of the humanities.” Today the Hebrew University in Jerusalem, as well as Bar-Ilan and Ben-Gurion universities are also experimenting in this trend.

In Elovits’s vision students will be required to get to know already in the first year of history studies the computer tools – from the simple programs to the specific algorithms that will enrich the old humanities studies – together with the introduction to ancient history.

Elovits gave an example of the digital revolution in humanities, in one of his required research papers. He compared the quantity of primary source material gathered by a professor who had studied the subject, over weeks of archival work, with the material he found in a few minutes from his home computer.

“He only got 10 percent of what I gathered,” he said. Anyone who reads this article can experience this himself. He used the free information bank at Historical Jewish Press, an online archive of newspapers written and published by Jews, with 2.5 million scanned pages from the 19th century to the present. It was founded by Prof. Yaron Tsur of Tel Aviv University’s Jewish History department, a pioneer of digital humanities in Israel.

“No matter how many resources you have as a single researcher and no matter how large your research group, ultimately you won’t be able to touch but the very tip of the corpus and you won’t be able to obtain but a tiny particle of the wealth of existing manuscripts,” says Elovits.

With computers, researchers can study all the material at once and “activate research tools of textual analysis, to form new research insights,” he says.

Ruth Kaplan, an architect and Ph.D. candidate in Jewish history at the University of Haifa, uses eLijah-Lab in researching the history of Jews in Lodz, Poland. Kraken is helping her to analyze population censuses carried out in the community.

“From these censuses it’s possible to obtain a considerable amount of valuable information about the life that disappeared after the Nazis’ rise to power, if one only knows how to take advantage of the computer’s abilities,” she says.

She traced the censuses’ results in archives in Poland, in an outdated format and an inferior quality. “Now, after the computer has learned to read these texts, it will be possible to obtain information that has been hidden in them until now,” she says.

“We don’t want to bury the classic humanities,” Lavee stresses. “To the contrary, we want to save them. We’re not replacing the researchers of the past, but improving the human mind with computerized tools.”