Reuniting the Dispersed Fragments

An ambitious new project seeks to digitize the entire Cairo Geniza, and piecing together half a million document fragments scattered around the world.

Ofer Aderet
Ofer Aderet
Send in e-mailSend in e-mail
Send in e-mailSend in e-mail
Ofer Aderet
Ofer Aderet

One spring day in 1896, a pair of scholarly Scottish twin sisters approached Dr. Solomon Schechter, an expert on rabbinical literature at Cambridge University. They had several tattered pages they had purchased in a Cairo antiquities shop.

The lecturer rubbed his eyes incredulously. He took the pages with him for examination, and then wrote a hurried, excited note to the two women, asking them not to share news of the discovery until a formal announcement could be made.

Yaacov Choueka.Credit: Emil Salman

The women's souvenir turned out to be part of a rare book that had been lost since the Middle Ages. It was the original Hebrew text of the apocryphal "Book of Ben Sira," written early in the second century BCE, 250 years before the destruction of the Second Temple.

A few months later, Schechter headed to Cairo to look for the rest of the book. His efforts led him to the attic of the Ben Ezra Synagogue, where he discovered the Cairo Geniza: a tremendous pile of torn and tattered pages. Some had been lying there for 1,000 years, since the 9th century, when the synagogue was established. They included sacred writings and documents from the community's daily life - and constituted a Jewish time capsule, preserved thanks to the Jewish custom of saving texts that included God's name, instead of throwing them away.

Prof. Schechter crawled through a hole in the wall in the women's section of the synagogue to the dusty and suffocating geniza (a Hebrew word meaning "storeroom" ), and began removing the manuscripts. After receiving the proper permits, he packed them into cartons and placed them on a ship to Cambridge. That is how most of the Cairo Geniza was taken to safety. The rest of the material, about 40 percent, was scattered among antiquities dealers, collectors and world travelers. Some of it wound up in other libraries.

Last week, two professors from the Blavatnik School of Computer Science at Tel Aviv University sat down to burrow through the tattered pages. But they weren't searching through them physically: The pages appeared on the huge screen of a state-of-the-art Macintosh computer with a wireless keyboard and mouse. The two, Nachum Dershowitz, a logic expert, and Lior Wolf, a expert on "computerized vision" (which is used for face recognition, among other things ), are the Tel Aviv branch of a revolutionary international project called Genazim, based in one of the high-rises in Jerusalem's Givat Shaul neighborhood. Its ambitious objective is to reconstruct the Cairo Geniza and to make available a complete digital version online. (A preliminary, partial version of "The Friedberg Genizah Project" can be viewed at

"It's a tremendous thing. When we present the project to people, they have tears of excitement in their eyes," says Dershowitz as he looks at selections from the Geniza in Maimonides' handwriting. "We're doing something very exciting here - processing historical documents in a manner and to an extent that is unprecedented in the humanities," adds Wolf.

Digitizing the humanities

The combination of computer sciences and Jewish studies - two areas of research that may seem unrelated - has met with amazing success. A few months after the project began, about a decade ago, new tools were developed that may accelerate the study of the Cairo Geniza.

The project was initiated by ultra-Orthodox millionaire Albert Dov Friedberg of Toronto, Canada, who wrote his doctoral thesis about Maimonides, the most notable person who prayed in Cairo's Ben Ezra Synagogue. Friedberg envisioned a website that would house all the Geniza pages and the corresponding research. First to be recruited for the project were teams of Geniza experts from all over the world, under the academic direction of a Hebrew University professor. But they could not find the necessary technology for their work.

The turning point came five years ago, with the appointment of Prof. Yaacov Choueka as project director. Choueka, 75, has devoted decades of his life and work to integrating computer sciences into Jewish studies. The "doctor of mathematics who was swept up by computer sciences," as he puts it, is a world-renowned scholar who specializes in computerized text analysis and language processing. He is also an expert in Jewish sacred literature with a respected family connection - he is the great-grandson of the 19th-century chief rabbi of Aleppo, Syria. Choueka himself was born in Cairo and is fluent in Jewish Arabic, the language in which many of the Geniza documents are written.

Choueka already has participated in several projects of this type. One of them, the Responsa Project - the largest electronic collection of Torah literature - was the first in Israel and one of the first in the world to combine computerization and the humanities. Yet Choueka feared that digitizing the Geniza would be almost impossible, and hesitated for a long time before committing to it.

The first main challenge was the fact that Geniza documents were scattered in collections and libraries around the world. No fewer than 75 public and private libraries now have parts of the Geniza. The first stage of the project, which took about two years, was preparing an inventory of Geniza pages wherever they were. The list was painstakingly compiled from libraries across three continents in cities including Jerusalem, London, Paris, Vienna, Budapest, New York and Philadelphia, in addition to Cambridge.

"For the first time after 120 years, when the Geniza was scattered, we succeeded, with the libraries' help, in preparing a complete list of Geniza documents," said Choueka at the time. The final list is a comprehensive online catalog that includes the call numbers of about 350,000 fragments from scrolls, pamphlets, pages and books. The list includes books of the Bible and the Talmud and their commentaries, passages of prayer and piyyut (liturgical poetry ), midrash and mussar (ethical teachings ), philosophy and sciences. It also includes personal and business correspondence, financial records, recipes and formulas for medicines, engagement and marriage contracts and writs of divorce, and even handwriting exercises. Along with the catalog, the website also offers bibliographical information compiled from dozens of sources.

The second stage of the project, which will be completed by the end of next year, involves creating high-quality digital copies of all the Geniza material. "This is a tremendous revolution," says Choueka. "Until now scholars had to search for manuscripts in libraries all over the world, or peruse microfilm that is not always of good quality. This revolution will let any scholar, anywhere in the world, access the entire Geniza via computer at any time."

This will not only spare scholars the need to track down the physical documents, it will also give them a higher-quality version than the original. The documents were photographed at a high resolution, enabling users to enlarge them on their computer screens to the point where they can see details of letters, marks and colors that are not evident in the original to the unaided eye. For many scholars, this is an unprecedented improvement.

Some 300,000 Geniza documents have been copied so far. The researchers are copying 10,000 documents a month at the Cambridge University Library. When the work is completed, another 250,000 images will have been added to the database. By the end of 2012, it will have 550,000 images of Geniza documents, 99 percent of all existing material, promises Choueka. "They will be accessible free of charge," he says. Before the project began, there were only 1,000 Geniza documents available in digital form.

As expected, some of the libraries initially refused to cooperate with the new project, fearing that if the texts were accessible online, nobody would come to the library anymore. Ultimately, "most of the libraries understand that this is the future and that it is worth their while to cooperate with us and our financing," says Choueka. Three libraries are still resisting getting involved - in Moscow and St. Petersburg, Russia, and Kiev, Ukraine. Negotiations are ongoing with two English university libraries, Oxford and Manchester.

Assembling the pieces

The project's crowning achievement so far is the work of an artificial intelligence team, which will help overcome the second biggest problem with the Cairo Geniza: In addition to being scattered all over the world, many documents were broken into pages and parts of pages. Part of a page may be in Oxford, while the other part is in Toronto. In order to reassemble pages, the parts must be found among the hundreds of thousands of documents scattered around the world, and they must be confirmed as matches.

Reassembling the entire Geniza requires hundreds of billions of individual comparisons. Experienced scholars have said this requires significant work and skill. Instead, the team is having computers do the job. The results thus far are encouraging. Since the Geniza's discovery 120 years ago, scholars pieced together only a few thousand separated fragments. In the new project's first trial, computers discovered almost 1,000 new pairings.

How do they do it? They teach the computer to analyze the digital images. This produces a huge database with data on all the fragments. Later the computer uses this information to compare the images, and to decide which fragments have a common source.

Dr. Roni Shweke, a Talmud scholar and computer scientist, and the son of Yaacov Choueka, was in charge of developing a unique software program to help with the task. The program measures the documents' physical properties, including their length and width, how many columns and lines they contain, margin size, and the density of the letters. It also identifies whether the page is complete, and determines whether it is missing corners or is cut or torn.

Meanwhile, the software developed by professors Wolf and Dershowitz uses face recognition technology, the kind used extensively by Facebook and Google's Picasa platform. Wolf explains: "There wasn't enough previous experience with manuscripts, but I had worked on face recognition in the past. The analogy is quite clear: A computer can identify the same person in different photos, even if he looks different in each one, and likewise can identify various fragments that once belonged to a single document, even if each one looks different today.

"This task is far greater than any person, and currently the computer can't do it alone either," explains Wolf. "That is why the Geniza scholar is constantly working with the computer, giving it feedback and thus improving its future recognition capability. This synergy, which has never existed on such a scale in the humanities, is very exciting."

Dershowitz explains: "The computer, as opposed to a person, won't get bored after perusing thousands of biblical fragments, and will go through them far more quickly than any human being. But only the human scholar can read what the fragments say, and understand their context."

For now, at least. "In the future, the computer will be able to see everything a human can see," says Wolf. "The only question is when that future will arrive." He adds, "I have no doubt that within a decade, things will be totally different."

Current technology enables computers to read texts. But it doesn't do well with ancient manuscripts, which often contain writing styles that no longer exist, connected letters or smudges. "We're working on that too, in cooperation with scholars from the Hebrew University in Jerusalem," says Choueka. "The initial results are good. In the future the computer will also be able to 'read' the Geniza to some extent, and find fragments that scholars missed due to the tremendous volume."

"Experienced professors, who've already seen everything, were skeptical about using computers and wondered, 'What can the computer find that I haven't succeeded in finding?" says Dershowitz. But even the greatest skeptics soon became enthusiastic. "They understood that the computer can save them a great deal of work and let them focus on content rather than searching for lost materials," he explains. Wolf adds, "They know that computer sciences don't replace the humanities; they give them new work tools."

The technology developed by Wolf and Dershowitz is likely to serve them in another project: computerizing the Dead Sea Scrolls. Last month a new website was launched, called "The Dead Sea Scrolls Digital Project," which presents high-quality images of five scrolls from the 3rd century BCE to the 1st century CE, discovered in the Judean Desert in the 1940s and 1950s.

The new site, a joint project of the Israel Museum in Jerusalem and Google, enables users to study the scrolls at high resolution and to search through them. "It's as if a child spilled 10 boxes of puzzles onto the floor and threw half the pieces into the garbage," says Dershowitz, trying to explain the problem that the scrolls' manuscripts present. "We're working on this in cooperation with a professor who specialized in using the computer to solve jigsaw puzzles."

Choueka says, "It's the same idea and the same type of material as the Cairo Geniza: many manuscript fragments, mishandled, scattered and studied by various scholars. Now, finally, they are being copied and made available to the entire world.

Along with computerizing the Cairo Geniza, Choueka is now beginning to work on a project that he says is bigger, more comprehensive and more important. The project, to be called "Hakhi Garsinan" - meaning "this is how the text should be read," the Aramaic phrase Rashi used to denote changes in versions of the Babylonian Talmud - will present every change that appears in any version of the Talmud based on all the manuscripts and printed editions in the world.

"It will be a modern website constructed in an unprecedented manner," said Choueka. "Anyone who studies a page of Gemara - a yeshiva boy, a student at home or a university researcher - would be able to see all the versions for every line based on all the existing manuscripts."

For about 150 years, Talmud scholars have been trying to launch a similar project. Now, thanks to Choueka's computers and technology, he promises that this project will take off within a few years.