Is That So?

Bar-Ilan researchers have developed software that can assess the credibility of Wikipedia entries.

At first glance, the Homo habilis entry at the Hebrew Wikipedia site looks like a credible information source about the earliest hominid species. The entry starts with a general description and goes on to depict H. habilis' physical features and the family structure. However, strange statements in the 'occupation' section regarding H. habilis' economic and sex life seem more appropriate for a "Planet of the Apes"-type science fiction movie than for an encyclopedia.

According to the entry, the ape-like creatures that lived about two million years ago "exchanged precious assets with their neighbors, such as sons and daughters; brothers and sisters. Every group aspired to retain a majority of people in its group because humans were so valuable, and it was possible to derive benefit from their economic, emotional and sexual services."

This determination does not cite a source, and the entry ends with an astonishing theory that "the revulsion and anger triggered by incest reflected the dangers to which the group's members were exposed, due to the collapse of an exchange if there was a problem."

This is only one particularly glaring example of the downside of Wikipedia's editing system, which usually works efficiently and manages to identify critical mistakes.

Now that Wikipedia has become a major source of information - mainly for elementary and high school students - quite a few have probably fallen for the veracity of this theory, apparently propounded by a surfer with a wild imagination. The problem is that because most of Wikipedia's entries are quite reliable, many surfers tend to trust this free encyclopedia blindly.

Eti Yaari of the Information Science department at Bar-Ilan University wrote her doctoral thesis on the possibility of automatically evaluating the veracity of Wikipedia entries, in order to provide an efficient scale for surfers to judge the quality of an entry.

Yaari cites studies indicating that most Internet users to not take the trouble to verify the information they find on the Web by cross-referencing with other sources.

"The best way to verify the authenticity of the entries is to check a few sources," says Yaari. "But it turns out that people think this is a waste of time. They are interested in finding information as quickly as possible."

Yaari's thesis, written with the guidance of Prof. Shifra Baruchson-Arbib, will be presented next week at the Second National Conference for Doctoral Students Who Study the Internet, hosted by the Netvision Institute for Internet Studies at the Faculty of Management, Tel Aviv University. Yaari says that many researchers are attempting to create mechanisms that can evaluate the quality of the information on the Internet.

"Universities around the world are developing automatic systems for evaluating [information] quality," says Yaari. "I was surprised at the number of researchers in this field. We are experiencing an information explosion - and finding information is not the problem, but rather finding quality information."

The idea behind Yaari's thesis is to base the quality evaluation mechanism on the insights of searchers. To this end, Yaari interviewed 64 Internet users, asking them to rate the quality of a few Wikipedia entries and explain these ratings. Yaari thus discovered a few characteristics that many of the study's participants considered effective in determining the credibility of an entry.

Among other things, surfers counted the number of external links added to an entry; its length; the development of the entry, i.e. the number of times it had been altered relative to how long it had been in existence; and the number of different people who had participated in writing the entry, which is based on its "history" page.

Yaari then examined 2,224 entries in the Hebrew Wikipedia and evaluated them based on the criteria provided by the study's participants. She found that the criteria indeed served to effectively identify quality entries, including some that were included in Wikipedia's "most recommended" category.

"The criteria had a high level of accuracy," says Yaari.

Now Yaari, in conjunction with Prof. Yehudit Bar-Ilan and Baruchson-Arbib, is trying to build a practical application that will automatically evaluate Wikipedia entries based on the quality criteria identified by her study.

"The goal is to build a tool you can use every time you visit Wikipedia," says Yaari. "A box will pop up, indicating the quality of the entry, for example with a rating of 1 to 10."

Yaari's research team recently received a grant from the Israel Internet Association to develop the evaluation tool.

Yaari hopes that Wikipedia users will soon not have to rely on their instincts concerning the reliability of a Wikipedia entry they are reading.

"I really love Wikipedia," says Yaari. "It is a wonderful [information] source. But the problem is that it is used indiscriminately. People say, 'I found material. It popped up first on Google, so that's enough for me.'

"Efforts have been made to provide users with tools to contend with the abundance of information," explains Yaari, "but now this is apparently not enough. If people do not check themselves, it is worth providing them with a 'robot' that will check the information for them, as far as possible."