Israeli scholars set out to compile the ultimate Hebrew dictionary
Coordinator of the Historical Dictionary of the Hebrew Language says humans, not computers are needed, since they can reach 'dark and complicated corners.'
One morning last week Dorit Kedari-Davidov, an employee of the Academy of the Hebrew Language, found the word "hamsin" in the writings of Yosef Haim Brenner. In his 1911 novel "From Here and There," Brenner wrote (as loosely translated into English ): "It was a Saturday at the end of the month of Tammuz with an unbearably harsh, hot air current known as hamsin."
A quick search of the academy's Historical Dictionary of the Hebrew Language, a work in progress, revealed that the citation is the first known recorded instance in which this Arabic word was used in Hebrew. Brenner spelled it with the Hebrew letter sin instead of the samekh used today, to indicate the morphological connection between the hot air current and the word for "five," signifying the 50 days a year when this unpleasant weather is said to come to the region.
The digitization of old texts has become fairly common - for instance, the Hebrew University of Jerusalem recently announced it was expanding the digital version of its Einstein archives - but, in something of a switch for the Internet era, the Historical Dictionary of the Hebrew Language is aiming for greater comprehensiveness than researchers say computers can yield.
"We want to reach 100 percent and not the 70 percent the computer can achieve, and for that we need humans, precisely for the dark and complicated corners the computer can't reach," said Dorit Lerer, coordinator of the dictionary project.
Though this massive historical dictionary could take another generation to complete, it will become accessible to the general public online within a few months if the language academy receives the necessary funding. Until now the database has been open only to researchers and subscribers.
Super-concordance of the Hebrew language
Every text entered into the databank is read by six different people before the entry is finalized. First the text is read and entered into the computer, then linguists analyze the text by examining the meaning of each word, the lexicographic root and the structure. Finally, the dictionary entries are written. The result is a kind of super-concordance of the Hebrew language.
"In this way you can take any word, know how old it is, how it has been transformed over the years and what meanings it takes on," said linguist Chaim Cohen, the current editor of the dictionary and the third since its inception. "The value of the dictionary is not only as a dictionary but rather as a basis for research on non-dictionary issues."
On the computer screen of one of the dictionary researchers is the Talmudic expression "Mai ekhpat lekha" - What difference does it make to you? One can't help but think of its current abbreviated incarnation, "ma khpatkha?" - Whadda you care? - in the Hebrew of today.
"If you take the Song of Deborah," which is considered one of the earliest chapters in the Bible, and the Haaretz newspaper from this morning," said Doron Rubinstein, the coordinator of the modern literature section, "I could enter both of them into the database and use the same coding system to analyze them."
"There is no new Hebrew without ancient Hebrew," said Cohen. "Our Hebrew is in fact founded on ancient Hebrew."
Another generation's worth of work
The dictionary already contains more than 20,000 entries and has been under construction for 58 years. But it could take another generation until the dictionary is complete, said the president of the language academy, Moshe Bar-Asher.
The labor-intensive project requires many hours spent reading, defining and breaking down word after word in an effort to create the most comprehensive map of the Hebrew language. There is something seemingly Sisyphean about the snail's pace of fastidiously compiling a massive database of Hebrew words.
The 7,919 texts that have been entered into the database include the Mishna and Talmud - but not the Bible, for which concordances already exist. Other ancient texts include the Dead Sea Scrolls, Gaonic literature from the 6th to 11th centuries and medieval poetry. The most recent writer is S.Y. Agnon, who died in 1970, and other modern writers whose work is going in the database include Chaim Nachman Bialik, Ahad Ha'am, Vladimir Jabotinsky and Mendele Mocher Sefarim. Haim Be'er, A.B. Yehoshua and Amos Oz could also make it in, according to Bar-Asher.
There is no debate over which texts written up to 1050 get into the database; other than the Bible, all of them do - including inscriptions on ancient coins. Nor are the texts confined to the most familiar editions of various forms of literature. Thus, for example, some of the linguists used photographs of the original Cairo Geniza parchments to figure out the meaning of distorted texts, and read ancient manuscripts of the Babylonian Talmud.
Yehiel Kara, one of the researchers working on the dictionary, succeeds in reconstructing words even in places where ordinary readers of Hebrew see only scattered spots of ink.
"Here it says 'Yerushalayim' [Jerusalem]," he explained to a novice, pointing to a letter written around 1055 that was found in the Cairo Geniza. "Can you see the three fragments of the letter shin? If you look hard, this apostrophe-like mark is the mem." The letter itself says: "It is better to eat an onion in Jerusalem than a cockerel in Egypt."
The problem starts after the year 1050. At that time the language began to expand significantly, and it is not possible to enter all the texts ever written in Hebrew into the database. From then to today, Ben-Asher estimates that 500 million to 1 billion words have been written in Hebrew. In order to decide whether to include a given text in the database, scholars assess the influence a text has had on the Hebrew writing that came later or affords a look at unique Hebrew words, such as scientific literature from the Middle Ages.
The dictionary established a modern literature section in 1969, for texts written from 1750 onward, and the researchers are making an effort to cover the years between 1050 and 1750 as well.
Thirty employees, most of them part-time, are working on the historical dictionary, which has an annual budget of about NIS 5 million.
Forty percent of the workers are not actually native speakers of Hebrew. Natalie Okun, for example, a native of Morocco whose mother tongue is French, specializes in reading and processing Hebrew texts that have been translated from the French, like Eugene Sue's "Mysteries of Paris," translated into the Hebrew by Kalman Schulman.