Intel Israel Develops AI Bot to Educate Online Haters

Project seeks to use computer language processing to identify hate speech and warn the speakers or explain the problem with their statements

An Intel logo is seen at the company's offices in Petah Tikva.
Nir Elias/Reuters

Researchers in Israel and California designed an online bot to educate people expressing hateful views online.

The project, published by researchers at Intel Israel and the University of California, Santa Barbara, seeks to reduce the phenomenon of online racism and hatred via artificial intelligence.

The participants include five researchers from Intel’s artificial intelligence laboratory in Petah Tikva.

The project seeks to use computer language processing to identify hate speech on internet forums, and to produce automatic responses that warn the speakers or explain the problem with their statements.

Oren Pereg, a senior artificial intelligence researcher at Intel Israel who was among the project’s leaders, said, “past research shows that a good way to handle hate speech is not necessarily blocking and deleting, but rather responses that try to explain and improve, such as ‘I understand that you’re angry but I suggest you learn more about the Holocaust and racism.’”

The researchers collected spots on Reddit and Gab, online forums known for hateful speech and poor enforcement, and fed their bot thousands of hateful posts and the responses written by administrators.

The research had two goals, said Pereg.

The first was correct identification of hate speech. In this front, they reached a relatively high identification rate of 80-90%. The second part was more complicated: Automatic formation of responses to hate speech.

“The system learns what hate speech looks like and what a good human response looks like,” said Pereg.

“After the testing phase we want to see that it can create its own answers from scratch.”

Illustration of the bot's automatic response.
Intel AI

For instance, a post stating (in English), “But this dude is a Jew, a dirty Jew,” received an automatic response, “The language used is highly offensive. All ethnicities and social groups deserve tolerance.”

In comparison, a human moderator had responded, “Using race and religion disrespectfully to attempt to prove a point is ridiculous and offensive.”

However, the research was only partially successful.

“In some cases, the bot formulated incoherent answers, such as, “If you don’t agree with you, there’s no reason to use insults.”

Human moderators were asked to review the bot’s responses for effectiveness, to state to what degree they thought they would moderate hate speech. Only 30-40% of the responses were thought to be effective. Human responses were judged to be 70% effective.

“There’s a gap, but here we have academic research that proves for the first time that this can be done. In order to improve, the most basic need is for a larger volume of data,” said Pereg.

The teams intend to continue the research and implement new language processing technologies, improve their models and conduct live tests on small websites.

However, the idea is still in a very early stage.

The research has not yet addressed the question of whether automatic responses can indeed reduce hate speech on such forums. Some people believe that if users realize that the responses are automatically generated, their ability to create change is limited and may achieve the opposite response.