It’s been nearly ten years since the human genome was first decoded – and now scientists say they’ve finally mapped the underlying regulatory system that enables it to do its job.
The most striking discovery is that so-called ‘junk DNA’ – DNA which appeared to have no function – is actually nothing of the sort.
The international team has linked more than 80 percent of the human genome sequence to a specific biological function, as well as mapping more than four million regulatory regions where proteins specifically interact with the DNA.
“During the early debates about the Human Genome Project, researchers had predicted that only a few percent of the human genome sequence encoded proteins, the workhorses of the cell, and that the rest was junk. We now know that this conclusion was wrong,” says Eric D Green of the National Institutes of Health.
“[The Encyclopedia of DNA Elements] ENCODE has revealed that most of the human genome is involved in the complex molecular choreography required for converting genetic information into living cells and organisms.”
Hundreds of researchers across the US, the UK, Spain, Singapore and Japan performed more than 1,600 sets of experiments on 147 types of tissue, using next-generation DNA sequencing technologies. In total, the project’s generated more than 15 trillion bytes of raw data, and consumed the equivalent of more than 300 years of computer time to analyze.
“We’ve come a long way,” says Ewan Birney of the European Bioinformatics Institute. “By carefully piecing together a simply staggering variety of data, we’ve shown that the human genome is simply alive with switches, turning our genes on and off and controlling when and where proteins are produced. ENCODE has taken our knowledge of the genome to the next level, and all of that knowledge is being shared openly.”
Every cell in the human body carries the entire set of 21,000 protein-making genes, and these newly-discovered switches control which ones a particular cell will activate. While most switches are close to the genes they control, others can be a long way away.
“We were surprised that disease-linked genetic variants are not in protein-coding regions,” says Mike Pazin, an NHGRI program director working on ENCODE.
“We expect to find that many genetic changes causing a disorder are within regulatory regions, or switches, that affect how much protein is produced or when the protein is produced, rather than affecting the structure of the protein itself. The medical condition will occur because the gene is aberrantly turned on or turned off or abnormal amounts of the protein are made. Far from being junk DNA, this regulatory DNA clearly makes important contributions to human health and disease.”
In the long run, the new findings offer real hope for medical researchers, as it appears that many diseases are caused by changes in gene switches. In their paper, published in Nature, the researchers cite multiple sclerosis, lupus, rheumatoid arthritis, Crohn’s disease and coeliac disease as examples. The same appears to be true for many types of cancer.