Big Data Solves Mystery: Why Humans Have No More Genes Than Worms

How can Man the Great have no more genes than a nematode, and less than a grape? A Haifa University analysis figures it out: We don't know what a 'gene' is.

Ruth Schuster
Ruth Schuster
Send in e-mailSend in e-mail
Guess who has more genes, the person or the fruit. Go on, guess. Now guess why.
Guess who has more genes, the person or the fruit. Go on, guess. Now guess why.Credit:
Ruth Schuster
Ruth Schuster

The human genome has been sequenced, and to our mortification, we seem to have no more genes than a worm, and a lot less than a tomato. Surely Man the Great and Powerful should have a lot more genes than a lowly invertebrate or fruit? Now a Haifa University team of statisticians and geneticists has found evidence backing a theory of why Man and Nematode could have about 20,000 genes each, yet be so vastly disparate in complexity: the existence of genetic networks, whose effect is so weak as to be almost undetectable by usual statistical means.

The team of statisticians and geneticists proposesa new approach to identify genetic networks - which means joint activity by genes, that affect each other's expression in different ways under different circumstances.

Until recently, genetics dealt mainly with the identification and effects of single genes that express themselves powerfully enough to be noticed. The methods devised by statistician Pavel Goldstein, under the guidance of Dr. Anat Reiner Ben-Naim in collaboration with Prof. Abraham Korol of the Evolutionary & Environmental Biology Department, can identify these networks, and may also help solve one of the big mysteries in genetics – what all that "junk DNA" in our genomes is really doing.

Pavel Goldstein. Photo by Avigail Tsuper

Have letters, can't read book

The genome was first sequenced in the last millennium. Scientists are still working on understanding its unique language.Credit: Haaretz

How many genes humans have is still unknown, though we believe we know most of them. More importantly, we don't know what our genes are doing, explains Goldstein.

It's like knowing the words comprising a language, but not knowing the language. What progress has been made in deciphering the "book of our DNA" has indicated that humans may have far fewer genes than assumed - somewhere between 20,000 to 25,000.

That number sounds embarrasingly small considering that nematode worms have about 20,000 too, grapes have about 30,000 and tomatoes have nearly 32,000.

But our obsession with "counting genes" may have misled us, if a "gene" is not a stand-alone thing that is either expressed (as a protein), partially expressed or not expressed. If genes interact and affect each other's expression, a given sequence of DNA may behave one way when "nudged" by another gene, and behave differently when "nudged" by a different gene, Goldstein explains to Haaretz.

And this may explain how a being (in our case, a human) can have so few "identified genes" and be so biologically complex.

The phenomenon of "cooperative" or interdependent genes is called epistasis. An epistatic gene's effect depends on activity by one or more "'modifier genes".

That was the theory. Now, using big data techniques, the Haifa team of statisticians and geneticists managed to nail down significant evidence of epistasis in action, using the genome of the humble rockcress, a cousin of cabbage.

This isn't about how many angels can dance on the head of a pin. A great many diseases, not least auto-immune conditions, could boil down to epistatic relations between genes that have yet to be identified, Dr. Reiner points out.

Rockcress, we thank thee for thy contribution to genetic research. Photo by Brona, Wikimedia

One man's junk

So at this point, definitions such as "man has 20,000 genes and so does the cat" are pretty pointless. The same applies to baffling genomic sequencing results indicating that about 98% of man's DNA is junk. (At least some of that junk turned out to be regulatory sequences - turning genes "on" or "off" or moderating their expression according to biological signals. Other bits are apparently genes we just haven't identified.)

"The Genome Project began in 1990 with the ambitious goal of sequencing the human genome within 15 years. They tried to take each gene and map it, but reached the conclusion that they had 98% junk because they didn't factor in the assumption of possible interactions between sequences, and that a given sequence could behave differently under different circumstances," Goldstein explains. "They only identified the 2% that had a strong enough effect to be identified in isolation."

So we have great hulking molecules of DNA with an unknown number of genes that interact with one another in unknown ways under circumstances we don't know. Tricky, analyzing that: The number of ways genes could 'combine' and interact is almost infinite. "It had been very hard to look at that effectively," Goldstein remarks.

This is where statistical theory comes into play.

Using our friend the rockcress genome, the team illustrated how their approach works with real data, and developed an algorithm to detect epistasis (collaboration between genes), while minimizing the chance of false identifications – by ignoring single, isolated known traits (genes). Instead, they looked at clusters (groups) of similar traits, using data mining techniques, he explains.

They also used dependence between neighboring DNA regions, arising from tight linkage of molecular markers – and then applied a hierarchical search technique. In other words, they began with an initial screen for epistatic global regions, followed by a more focused search only in the areas with high probability for epistasis.

That two-step approach enabled identification of very tiny epistatic effects that cannot be identified by other methods, Goldstein elaborates. The approach also dramatically reduces the number of statistical tests needed, considerably decreasing computation time.

The result: the identified effects provide a broader picture on related networks of genes. So it seems science just hasn't figured out yet what all those "junk" sequences do when provoked the right way. And, unbelievably complex interactions between genes could explain just why Man might, after all, have a third less genes than a grape, and still be a lot more complex.

Click the alert icon to follow topics:



Automatic approval of subscriber comments.

Subscribe today and save 40%

Already signed up? LOG IN


Palestinians search through the rubble of a building in which Khaled Mansour, a top Islamic Jihad militant was killed following an Israeli airstrike in Rafah, southern Gaza strip, on Sunday.

Gazans Are Tired of Pointless Wars and Destruction, and Hamas Listens to Them

Trump and Netanyahu at the White House in Washington, in 2020.

Three Years Later, Israelis Find Out What Trump Really Thought of Netanyahu

German soldier.

The Rival Jewish Spies Who Almost Changed the Course of WWII

Rio. Not all Jewish men wear black hats.

What Does a Jew Look Like? The Brits Don't Seem to Know

Galon. “I’m coming to accomplish a specific mission: to increase Meretz’s strength and ensure that the party will not tread water around the electoral threshold. If Meretz will be large enough, it will be the basis for a Jewish-Arab partnership.” Daniel Tchetchik

'I Have No Illusions About Ending the Occupation, but the Government Needs the Left'

Soldiers using warfare devices made by the Israeli defense electronics company Elbit Systems.

Russia-Ukraine War Catapults Israeli Arms Industry to Global Stage