December 05, 2002
Mouse Genome Sequenced

Its official, the mouse genome has been sequenced and this is a very good thing. You can read official announcement on the NIH site (which has the best copy for the click-thru to supporting docs), on Eurekalert, and on ScienceDaily. From the announcement:

The sequence shows the order of the DNA chemical bases A, T, C, and G along the 20 chromosomes of a female mouse of the "Black 6" strain - the most commonly used mouse in biomedical research. It includes more than 96 percent of the mouse genome with long, continuous stretches of DNA sequence and represents a seven-fold coverage of the genome. This means that the location of every base, or DNA letter, in the mouse genome was determined an average of seven times, a frequency that ensures a high degree of accuracy.

Earlier this year, the mouse consortium announced that it had assembled the draft sequence of the mouse and deposited it into public databases. The consortium's paper this week reports the initial description and analysis of this text and the first global look at the similarities and the differences in the genomic landscapes of the human and mouse. The analysis was led by the Mouse Genome Analysis Group. Below are some of the highlights.

  • Human Sequence: It's Bigger, But Is It Better? The mouse genome is 2.5 billion DNA letters long, about 14 percent shorter than the human genome, which is 2.9 billion letters long. But bigger doesn't always mean better, say scientists. The human genome is bigger because it is filled with more repeat sequences than the mouse genome. Repeat sequences are short stretches of DNA that have been hopping around the genome by copying and inserting themselves into new regions. They are not thought to have functional significance. The mouse genome, it seems, is more fastidious with its housecleaning than the human. Although it is actually accumulating repeat sequence at a greater rate than humans, it is losing them at an even greater rate.

  • Shuffling the Chapters of an Ancestral Book. The mouse and human genomes descended from a common ancestor some 75 million years ago. Since then there has been considerable shuffling of the DNA order both within and between chromosomes. Nonetheless, when scientists compared the human and mouse genomes, they discovered that more than 90 percent of the mouse genome could be lined up with a region on the human genome. That is because the gene order in the two genomes is often preserved over large stretches, called conserved synteny. In fact, the mouse genome could be parsed into some 350 segments, or chapters for which there is a corresponding chapter in the human genome. For example, chromosome 3 of the mouse has chapters from human chromosomes 1, 3, 4, 8 and 13, and chromosome 16 of the mouse has chapters from human chromosome 3, 21, 22 and 16.

  • Heavy Editing at the Level of Sentences. Although virtually all of the human and mouse sequence can be aligned at the level of large chapters, only 40 percent of the mouse and the human sequences can be lined up at the level of sentences and words. Even within this 40 percent, there has been considerable editing, as evolution relentlessly tinkers with the genome. The change is so great in most places that only with very sensitive tools can scientists discern the relationships.

  • Preserving the Gems. Despite the heavy editing, about 5 percent of the genome contains groups of DNA letters that are conserved between human and mouse. Because these DNA sequences have been preserved by evolution over tens of millions of years, scientists infer that they are functionally important and under some evolutionary selection. Interestingly, the proportion of the genome comprised by these functionally important parts is considerably higher than what scientists had expected. In particular, it is about three times as much as can be explained by protein-coding genes alone. This implies that the genome must contain many additional features (such as untranslated regions, regulatory elements, non-protein coding genes, and chromosomal structural elements) that are under selection for biological function. Discovering their meaning will be a major goal for biomedical research in the coming years.

  • The Gene Number. When the human genome consortium concluded last year that the human sequence contains only 30,000 to 40,000 protein-coding genes, the news elicited a collective international gasp. Humans, it seems, have only about twice as many genes as the worm or the fly, and fewer genes than rice. Many wondered how human complexity could be explained by such a paucity of genes. The prediction has since been the subject of debate with some researchers suggesting much higher gene counts. The human-mouse comparison will likely put the yearlong speculation to rest, indicating that if anything, the gene numbers may be at the low end of the range. Today's paper suggests that the mouse and the human genomes each seem to contain in the neighborhood of 30,000 protein coding genes.

  • Sex, Smell and Infectious Disease. Although the mouse and the human contain virtually the same set of genes, it seems that some families of genes have undergone expansion - or multiplied - in the mouse lineage. These involve genes related to reproduction, immunity and olfaction, suggesting that these physiological systems have been the focus of extensive innovation in rodents. It seems that sex, smell, and pathogens are most on the mouse's evolutionary mind. Scientists do not yet know the reasons for this, but they speculate that a shorter generation time, changes in living environment, lack of verbal and visual cues, and differences in reproduction may account for this.

  • Uneven Landscape of the Genomes. Since the two species diverged, the ancestral text has changed considerably, with substitutions occurring in both species. Twice as many of these substitutions have occurred in the mouse compared with the human lineage. A great surprise is that mutation rates seem to vary across the genome in ways that cannot be explained by any of the usual features of DNA.

  • Empowering Mouse as a Disease Model. The laboratory mouse has long been used to study human diseases. There are more than a hundred mouse models of Mendelian disorders, where a mutation in mouse counterparts of human disease genes results in a constellation of symptoms highly reminiscent of the human disorder. But there are many more such models to be found, and the availability of the mouse genome sequence will make their discovery only a few "mouse" clicks away. Furthermore, hundreds of additional mouse models of non-Mendelian diseases such as epilepsy, asthma, obesity, colon cancer, hypertension, and diabetes, which have been more difficult to pin down, will now be much more accessible to the tools of the molecular geneticist.

  • Understanding the Mouse. The mouse genome sequence will also open new paths of scientific endeavor aimed at understanding how the mouse genome directs the biology of this mammal. Scientists will no longer be working on genes in isolation, but will view individual genes in the context of all other related genes and in the context of a whole organism. They will be able to study many, even all, genes simultaneously, speeding the understanding of the mouse in molecular terms. Scientists say such molecular understanding of the mouse will be essential to realize the full benefits of the human genome sequence.

The sequence information from the mouse consortium has been immediately and freely released to the world, without restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies, providing key information services to biotechnologists.

The work reported in this paper will serve as a basis for research and discovery in the coming decades. Such research will have profound long-term consequences for medicine. It will help elucidate the underlying molecular mechanisms of disease. This in turn will allow researchers to design better drugs and therapies for many illnesses.

"The mouse genome is a great resource for basic and applied medical research, meaning that much of what was done in a lab can now be done through the Web. Researchers can access this information through, where all the information is provided with no restriction," says Ewan Birney, Ph.D., Ensembl coordinator at the European Bioinformatics Institute.

The Washington Post write-up emphasises the importance of the discovery of more conserved sequence sections than expected.

The big surprise in the research, however, was that about 5 percent of the genetic material of mice and people is highly conserved, and matching genes alone can account for only about 2 percent of it. That means as much as 3 percent of the genetic material is playing a critical but mysterious role--one so important nature has kept that genetic information largely intact for 75 million years.

It's only speculation now, but most scientists think those stretches of DNA will prove to be regulatory regions--instructional segments that somehow govern the behavior of genes. More and more, to cite one example, it looks as though mice and people will turn out to have very different brains not because the genes encoding their brain cells are so different, but because the instructions that regulate how many times those cells reproduce during development are different--producing a far bigger brain in a human than in a mouse.

The discovery of the larger-than-expected conserved areas is the important thing to come out of the mouse DNA sequencing so far. Another interesting discovery is 300 genes that are unique to mice:

But the comparison has also revealed genetic differences too. Mice have around 300 genes humans do not and vice versa. The biggest disparities are linked to sex, smell, immunity and detoxification.

All are genes which help animals adapt to new environments, infections and threats. "All the fast things that happen in evolution are down to life-or-death conflicts, either with other organisms, or within species for mate selection," says Chris Ponting, head of a team at the MRC Functional Genetics Unit in Oxford, UK.

I will be very curious to see whether some scientists eventually track some of those genes to viruses. It is quite possible that viral infections left genes behind at some point and that those genes turned out to do useful things for mice.

These results suggest a much bigger role for RNA that does not code for peptides. Large amounts of the DNA that was unexpectedly found to conserved (not changed by accumulation of random mutations) in humans and mice which does not code for proteins may instead code for regulatory RNA molecules.

RNA, a more ancient chemical version of DNA, performs many basic tasks in a cell, one of which is to form a copy or transcript of a gene and direct the synthesis of the gene's protein. Recently, some of these RNA transcripts have been found to have executive roles all their own, without making any protein. An RNA gene is responsible for the vital task of shutting all the genes on one of the two X chromosomes in each female cell, ensuring that women get the same dose of X-based genes as men, who have just one X chromosome.

The mouse genome sequencing results have provided an immediate benefit for understanding the human genome by helping to identify an additional 1200 human genes that had gone unrecognized.

More than 2,000 of the shared regions identified in this study (out of 3,500) do not contain genes. What precisely these non-gene regions, sometimes called 'junk DNA', are doing in the genome is not yet known.

The consortium researchers discovered about 9,000 previously unknown mouse genes and about 1,200 previously unknown human genes. The mouse genome is 14 percent smaller than the human genome and contains about 2.5 billion letters of DNA.

The genetic differences between humans and mice turn out to be greater than expected:

In the Dec. 5 issue of the journal Nature, Pevzner and other scientists in the 31-institution Mouse Genome Sequencing Consortium published a near-final genetic blueprint of a mouse, together with the first comparative analysis of the mouse and human genomes. (Read NIH news release at In a companion paper published in today's Genome Research journal, Pevzner and Tesler (in collaboration with Michael Kamal and Eric Lander at the Whitehead/MIT Center for Genome Research) analyze human-mouse genome rearrangements for insights about the evolution of mammals, and outline their development of a new algorithm to differentiate macro- and micro-level genome rearrangements.

Their conclusion: although the mouse and human genomes are very similar, genome rearrangements occurred more commonly than previously believed, accounting for the evolutionary distance between human and mouse from a common ancestor 75 million years ago. "The human and mouse genome sequences can be viewed as two decks of cards obtained by re-shuffling from a master deck--an ancestral mammalian genome," said Pevzner. "And in addition to the major rearrangements that shuffle large chunks of the gene pool, our research confirmed another process that shuffles only small chunks." "We now estimate over 245 major rearrangements that represent dramatic evolutionary events," added Tesler. “In addition, many of those segments reveal multiple micro-rearrangements, over 3,000 within these major blocks—a much higher figure than previously thought (even though some of them may be caused by inaccuracies in the draft sequences)."

To go along with the announcement of the mouse genome sequencing Nature has a collection of articles on the importance of mice in biomedical research. I don't like the funky page design where each choice on that page brings up a pop-up where then one can click to get various articles. But some of the articles are quite interesting. For instance, in this article various scientists describe how the mouse genome sequence data speeds up their work.

While Jenkins and Copeland look back fondly on those early days, the mouse genome sequence (see page 520) is accelerating their research in ways that make their past achievements seem pedestrian. Back in the 1980s, if Jenkins and Copeland were interested in investigating a spontaneous mutation presented at the Jackson Lab's weekly 'Mutant Mouse Circus', it was a laborious process. Identifying the gene involved meant crossing about 1,000 mice to map it to a stretch of chromosome bearing about 20 candidate genes. From there, a postdoc would have to sequence all of them in both normal and mutant mice to find out which was mutated.

"It used to be one postdoc project per mutation...," says Jenkins, "...and it was like looking for a needle in a haystack," adds Copeland. But since the mouse genome sequence became available in May (at, researchers can simply go to the database after the initial breeding experiments and look up all the genes in the relevant chromosomal region. By knowing from their sequences what types of proteins most of them encode, they can choose one or two that look most promising to search for the mutation.

"It took us 15 years to get 10 possible cancer genes before we had the sequence," says Copeland. "And it took us a few months to get 130 genes once we had the sequence." What's more, Jenkins points out, going back and forth between the mouse and human genomes will help to target related human genes that could be candidates for drug development.

This Nature article is especially interesting because it gives a sense of the sheer size of the job of some figuring out how mouse cells function and what methods may help to make the problem more tractable.

One experimental approach in which thousands of genes can be analysed in parallel is to isolate messenger RNA and to display the gene-expression profile on a chip. When this technique is applied to tissues, data are lost because aspects of the three-dimensional structures of multiple cell types are destroyed in the biochemical extraction. Data from in situ analyses contain more detailed information about each gene, but the generation of these data is serial and significantly slower.

Gene expression is being systematically examined at the transcriptional level by several groups, for instance in the 9.5-day-old mouse embryo and in adult tissues (see Box). Two other papers in this issue3, 4 report large-scale analyses of gene expression in embryonic and adult stages, but so far have examined just 0.5% of the genes in the genome, the homologues of the genes on chromosome 21. Transcription studies in situ have relatively limited resolution, and the tissues constituting a multicellular organism are complex mixtures of different cell types. Unless each cell is individually visualized for gene expression in combination with histological criteria, important information relating to biological function is lost, for instance the subcellular compartment(s) occupied by a protein.

The Sanger Institute's Atlas project is being established to systematically examine the expression pattern of every gene product at tissue-, cellular- and subcellular-level resolution, to provide a permanent, definitive and accessible record of the molecular architecture of normal tissues and cells. The ultimate goal is to define protein expression patterns for all 30,000 mouse genes in hundreds of different tissues, all gathered in archival data sets to support research projects worldwide. Data will be collected electronically and archived with a vocabulary allowing complex queries.

A recent report that is quite independent of the mouse genome sequencing effort demonstrates how mice are viewed as such a useful tool that scientists will transfer human genes into mice in order to be able to study the genes more easily.

Philadelphia, PA –Researchers at the University of Pennsylvania School of Medicine have bred a mouse to model human L1 retrotransposons, the so-called "jumping genes." Retrotransposons are small stretches of DNA that are copied from one location in the genome and inserted elsewhere, typically during the genesis of sperm and egg cells. The L1 variety of retrotransposons, in particular, are responsible for about one third of the human genome.

The mouse model of L1 retrotransposition is expected to increase our understanding of the nature of jumping genes and their implication in disease. According to the Penn researchers, the mouse model may also prove to be a useful tool for studying how a gene functions by knocking it out through L1 insertion. Their report is in the December issue of Nature Genetics and currently available online (see below for URL).

"There are about a half million L1 sequences in the human genome, of which 80 to 100 remain an active source of mutation," said Haig H. Kazazian, Jr., MD, Chair of Penn's Department of Genetics and senior author in the study. "This animal model will help us better understand how this happens, as well as provide a useful tool for discovering the function of known genes."

In humans, retrotransposons cause mutations in germ line cells, such as sperm, which continually divide and multiply. Like an errant bit of computer code that gets reproduced and spread online, retrotransposons are adept at being copied from one location and placed elsewhere in the chromosomes. When retrotransposons are inserted into important genes, they can cause disease, such as hemophilia and muscular dystrophy. On the other hand, retrotransposons have been around for 500 to 600 million years, and have contributed a lot to evolutionary change.

Its worth noting about this latest report that according to the mouse and human DNA sequencing project scientsts humans have more junk DNA than mice do and that mice may actually have just as much functional DNA as humans even though the human genome is bigger in total size. The human transposons mentioned in this report may have something to do with this state of affairs. Humans may have been under less selective pressure to keep down the amount of genetic waste that builds up (really probably parasitic DNA) or the human transposons might serve a more useful purpose than mouse transposons do. It will also be interesting to see how the work in this area progresses.

The availability of the mouse genome sequence is already accelerating efforts to understand the human genome more quickly. Also, the sequence data is going to be very helpful for scientists who are using mice to understand general phenomena in mammalian metabolism and cellular genetic regulation. Efforts to create genetically engineered mouse equivalents of human illnesses will be greatly helped by the identification of mouse equivalents of genes in humans. Still, most of the hard work is still to be done. It is much easier to figure out the primary sequence of a genome than it is to figure out how the expression of all genes is controlled or how all proteins function and interact with each other. Many more advances are needed in laboratory techniques, instrumentation, and in computer modelling in order to be able to fully understand how a single cell functions in all its complexity.

Update: ''We even have the genes that could make a tail'.

Yesterday's release also continues a pattern of humbling genetic revelations. Earlier research showed that humans had scarcely more genes than the lowly roundworm. Now there's proof that people are closely related to tiny, furry rodents.

''We even have the genes that could make a tail,'' said Dr. Jane Rogers, of the Wellcome Trust Sanger Institute in Cambridge, England.

Share |      Randall Parker, 2002 December 05 05:41 PM  Biotech Advance Rates

Post a comment
Name (not anon or anonymous):
Email Address:
Remember info?

Go Read More Posts On FuturePundit
Site Traffic Info
The contents of this site are copyright ©