A male’s genome contains information that can help scientists guess his surname, American and Israeli researchers have discovered. The findings have serious implications for data privacy; intelligence services around the world are bound to be interested.
In the study published in the journal Science, researchers have developed a formula for an algorithm that can discover men’s surnames by looking at the Y chromosome, the male chromosome. The Y chromosome, which is transmitted from father to son from generation to generation, includes markers called short tandem repeats, which form a kind of fingerprint.
The researchers, headed by Dr. Yaniv Erlich of the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts, fed 40 markers into a computer and compared them to Y-chromosome sequences on websites. In the United States, there are companies that show people’s genome sequences derived from saliva samples; this makes it possible to determine a person’s origins and locate relatives around the world.
"Even though there isn’t a unified database of all genetic sequences on the Internet, comparing a subject’s Y chromosome with information from existing databases could help locate his family,” says Israeli team member Eran Halperin, a professor at Tel Aviv University's Department of Molecular Microbiology and Biotechnology.
In addition to the Whitehead Institute, the research was carried out by experts at Harvard, MIT and the International Computer Science Institute in Berkeley, California.
The researchers and doctoral student David Golan of Tel Aviv University's statistics department developed the formula for the algorithm, which was tested on 911 men in the United States. The numbers were compared to Internet databases containing genetic sequences of 135,000 men with the most common surnames in the United States, most of them of European origin. The algorithm identified the surname 12 percent of the time, a success rate it later boosted to 18 percent.
The researchers, for example, applied the algorithm to the genome sequence of American geneticist Craig Venter, the head of a research institute in San Diego who was one of the first to sequence the human genome.
In the U.S.-Israeli experiment, the algorithm identified the Venter surname, and after crossing the data with other discoveries, figured out his age. Knowing that he lives in California, the researchers showed that only one other person in the state shares the unique spots on his chromosomes.
The researchers also scanned Y chromosomes of 10 residents of Utah, without knowing their last names; the algorithm helped them figure out the surnames of five of them, all of them Mormons.
“The identification technique could have a number of useful applications such as locating relatives and identifying corpses in natural disasters,” says Halperin. “But the research also reveals a fundamental problem: If a person publishes his genome on the Internet, even when this is done anonymously, his identity is pretty much exposed.”
Halperin adds that the ability to find a surname is based only on the Y chromosome, one of the body's 46 chromosomes. The study also raises questions about sharing genetic information from various sources.
“We take a positive view of sharing genetic information on public databases – with permission, of course,” says Halperin. “Sharing information is essential to science, and there are many advantages to users of these services. But it's important that all organizations involved in the data sharing be aware of the possible exposure and weigh their decisions accordingly.”
Erlich adds: “The obvious conclusion from the study is that biometrics can produce unexpected situations. We believe legislators must proceed with great caution when they plan such databases.”