Preserving the Web: National Library to Archive Almost All Israeli Websites

First stage of project launched by National Library to include copying, saving some 250,000 websites.

Oded Yaron
Oded Yaron
Send in e-mailSend in e-mail
Oded Yaron
Oded Yaron

Content from nearly all Israeli Internet sites is going to be stored in a new archive being launched by the National Library of Israel with the help of a gift of hundreds of thousands of shekels made by an unnamed philanthropist, the library announced yesterday.

“We deal with preserving Israeli culture, cultural memory, and social history, both Israeli and Jewish,” said library CEO Oren Weinberg. “Today, part of that memory is on the Internet. The quantity of texts available on news sites, blogs, poetry and research sites is growing, and we want to make sure these materials are preserved.”

With the project’s launch, Israel joins the British Library, the national library of the United Kingdom, which announced a similar project this year, and the Internet Archive, a California-based nonprofit that has been collecting digitized materials since 1996, and works together with the U.S. Library of Congress. The Israeli archive plans to get help from the Internet Archive in selecting and copying the Web content from the various sites. “They will essentially act as subcontractors for us, and we will make sure to save a copy of everything with us,” Weinberg explained.

During the first stage, the National Library will copy and save only Israeli Web pages, those with the “.il” suffix. According to Weinberg, this includes some 250,000 websites and some 150,000 sub-sites. Weinberg said the storage needed could easily come to dozens of terabytes annually (each terabyte is 1,024 gigabytes). The library will not save password-protected material, like personal account pages of service providers such as banks and health maintenance organizations.

Asked if the library will store pornographic sites, Weinberg said, “We are not selective, certainly not during the preliminary mapping. Later on we assume that we will not conduct a systematic collection but will be more selective, particularly regarding things that we think will need to be scanned more frequently. This is the first time that we’ll be making a copy of entire domains, and we will analyze the data from every collection and learn from the first time we go through the process.”

The National Library already has the legal right to scan every site with an .il suffix. Anyone who doesn’t want a site scanned can insert a line into the website code that asks that it not be included in the archive.

Complex legal challenges

Nevertheless, the project presents some complex legal challenges, explained Weinberg. There are lots of content subject to privacy protection or libel legislation, or materials that were collected and only afterwards removed from the original site by court order or by request of a person who felt harmed by it. “We will not violate a court order,” he said. “In the event that an original announcement is removed and a court determines [that publicizing it] would violate the law, we will store it but not allow access to it.”

A large and important part of Israel’s Internet activity takes place on social media sites that don’t have an .il suffix, and some of which, like Facebook, block access by search engines like Google. The National Library hopes to be able to get access to these sites at some later stage.

“Our goal is to create mechanisms that will allow us to copy content from social networks – content that we think is important or relevant to save from the perspective of public, social and political discourse,” said Weinberg. He said he hopes they will be able to map out important pages and get permission from those networks, or even from the page owners, to archive them. Ultimately, he added, the purpose is to make sure that large chunks of Israeli Internet history don’t get lost, which is what happened to a substantial portion of Israeli Internet content from the 1990s.

Surfing the 'Net in Israel.Credit: Eran Wolkowski