Now: The Rest of the Genome
Over the summer, Sonja Prohaska decided to try an experiment. She would spend a day without ever saying the word “gene.” Dr. Prohaska is a bioinformatician at the University of Leipzig in Germany. In other words, she spends most of her time gathering, organizing and analyzing information about genes. “It was like having someone tie your hand behind your back,” she said.
But Dr. Prohaska decided this awkward experiment was worth the trouble, because new large-scale studies of DNA are causing her and many of her colleagues to rethink the very nature of genes. They no longer conceive of a typical gene as a single chunk of DNA encoding a single protein. “It cannot work that way,” Dr. Prohaska said. There are simply too many exceptions to the conventional rules for genes.
It turns out, for example, that several different proteins may be produced from a single stretch of DNA. Most of the molecules produced from DNA may not even be proteins, but another chemical known as RNA. The familiar double helix of DNA no longer has a monopoly on heredity. Other molecules clinging to DNA can produce striking differences between two organisms with the same genes. And those molecules can be inherited along with DNA.
The gene, in other words, is in an identity crisis.
This crisis comes on the eve of the gene’s 100th birthday. The word was coined by the Danish geneticist Wilhelm Johanssen in 1909, to describe whatever it was that parents passed down to their offspring so that they developed the same traits. Johanssen, like other biologists of his generation, had no idea what that invisible factor was. But he thought it would be useful to have a way to describe it.
“The word ‘gene’ is completely free from any hypothesis,” Johanssen declared, calling it “a very applicable little word.”
Over the next six decades, scientists transformed that little word from an abstraction to concrete reality. They ran experiments on bread mold and bacteria, on fruit flies and corn. They discovered how to alter flowers and eyes and other traits by tinkering with molecules inside cells. They figured out that DNA was a pair of strands twisted around each other. And by the 1960s, they had a compelling definition of the gene.
A gene, they argued, was a specific stretch of DNA containing the instructions to make a protein molecule. To make a protein from a gene, a cell had to read it and build a single-stranded copy known as a transcript out of RNA. This RNA was then grabbed by a cluster of molecules called a ribosome, which used it as a template to build a protein.
A gene was also the fundamental unit of heredity. Every time a cell divided, it replicated its genes, and parents passed down some of their genes to their offspring. If you inherited red hair — or a predisposition for breast cancer — from your mother, chances were that you inherited a gene that helped produce that trait.
This definition of the gene worked spectacularly well — so well, in fact, that in 1968 the molecular biologist Gunther Stent declared that future generations of scientists would have to content themselves with “a few details to iron out.”
Stent and his contemporaries knew very well that some of those details were pretty important. They knew that genes could be shut off and switched on when proteins clamped onto nearby bits of DNA. They also knew that a few genes encoded RNA molecules that never became proteins. Instead, they had other jobs, like helping build proteins in the ribosome.
But these exceptions did not seem important enough to cause scientists to question their definitions. “The way biology works is different from mathematics,” said Mark Gerstein, a bioinformatician at Yale. “If you find one counterexample in mathematics, you go back and rethink the definitions. Biology is not like that. One or two counterexamples — people are willing to deal with that.”
More complications emerged in the 1980s and 1990s, though. Scientists discovered that when a cell produces an RNA transcript, it cuts out huge chunks and saves only a few small remnants. (The parts of DNA that the cell copies are called exons; the parts cast aside are introns.) Vast stretches of noncoding DNA also lie between these protein-coding regions. The 21,000 protein-coding genes in the human genome make up just 1.2 percent of that genome.
In 2000, an international team of scientists finished the first rough draft of that genome — all of the genetic material in a human cell. They identified the location of many of the protein-coding genes, but they left the other 98.8 percent of the human genome largely unexplored.
Since then, scientists have begun to wade into that genomic jungle, mapping it in fine detail.
This article has been revised to reflect the following correction:
Correction: November 13, 2008
An article on Tuesday about new genetic research and new ideas of what a gene is misstated the number of nucleotide bases, or “letters” of DNA, that would constitute 1 percent of the human genome. One percent would be 30 million bases, or “letters,” not 3 million.