January 22, 2006
Sanger DNA Database Doubles Every 10 Months

One reason why I am optimistic that rejuvenation therapies using Strategies for Engineered Negligible Senescence can reverse the aging process within the lifetimes of most people reading this is that the rate of advance of biotechnology is increasingly resembling the rate of advance of electronic technology. The rate of accumulation of DNA sequence information in a public database is doubling every 10 months.

The Archive is 22 Terabytes in size and doubling every ten months - perhaps the largest single scientific database in Europe, if not the world.

The database is large even compared to major non-DNA computer databases.

Martin Widlake, Database Services Manager at the Wellcome Trust Sanger Institute said: "At 22 000 GB the Trace Archive is in the Top Ten UNIX databases in the world. That's not bad for a research organisation of 850 employees in the countryside just outside Cambridge."

"It is possibly the biggest single (acknowledged) scientific RDBMS database in Europe, if not the world."

All the data are freely available to the world scientific community (http://trace.ensembl.org/), as a resource to geneticists all over the globe. When a researcher is studying a disease or gene, they can download the genetic information known about the area they are studying.

Luckily technologies for computer speeds and hard disk capacities are undergoing their own rapid rates of doubling and so computers will probably keep up with storage and processing needs of projects to reverse engineer and understand human and other species genomes.

Ray Kurzweil's argument that the technological advance is eventually going to accelerate to a rate we can't even comprehend (see The Singularity Is Near: When Humans Transcend Biology) seems plausible to me because of the doubling rates we see in computer speed, storage capacities, and fiber optic information transmission rates. On top of that we now have biotechnology advancing with rates that are highly analogous to the rates we've been watching in semiconductor technology for decades. Our perception of the rate of change from year to year or decade to decade up to this point does not tell us much about the rate of change 20 years from now because the rate of change is accelerating.

By Randall Parker at 2006 January 22 10:21 PM  Biotech Advance Rates | TrackBack

Comments
Anthony Kendall said at January 23, 2006 08:03 AM:

Rates of doubling of things like bits of information and their transfer capacity do not necessarily mean doubling of usefuly information. Does a DNA database with 44 TB of data truly contain twice as much information as one with 22 TB?

One of the remarkable abilities of the human brain is to chunk large quantities of information and make sense of bits and bytes at a higher level. Just think of all of the information about the trajectory of a baseball that is lumped into our understanding of where and when to position ourselves to catch it.
So, as we accumulate primary data, we needn't look at it as an exponentially-increasing mass of uninterpretable facts. Instead, we lump increasingly larger chunks of data to gain a higher understanding of the world and its systems.

The internet is a wonderful set of tools for this process. Blogs, Wikipedia, Google, RSS, and many other services and standards give us the ability to distill much this information down to a way that we can use it and understand it effectively. Using these tools, and many many others, humans may just be able to stay ahead of that curve indefinitely.

Kurt said at January 23, 2006 09:31 AM:

DNA data bases are nice, but I will be much more impressed when the microfluidics technology gets developed. Microfluidics will do to biotech and biomedicine which the IC did to electronics. The current microfluidic devices are very limited in what they can do and are not easily scalable. This will change as the technology improves.

All of the analytical instrumentation and synthesis tools are shrinked down and implemented into devices such that an entire lab can be made into a single device. Later, these devices can be scaled to do more complicated tasks. Leroy Hood's "nanolab" chip is one such attempt to produce such an integrated laboratory chip. These chips, the structures, on-board instrumentation, and synthesis components; are manufactured using micro and nano fabrication techniques similar to those used to make ICs and other MEMS devices. As we know well, this manufacturing technology is what is driving the "Moore's Law" improvement in semiconductor devices. Likewise, it should drive a similar improvement in laboratory instrumentation and bio-synthesis for biotech as well. This is when biotechnology should undergo its own "Moore's Law" as well.

jimcrack said at January 23, 2006 12:42 PM:

The genetic information issue is similar to the one at Homeland Security. We have vastly more communication intercepts than we can translate, or even analyze through global word searches. So how do we make sense of the entire genetic database of all life on earth?

Prevailing consensus suggests that there is a great deal of genetic garbage, representing false starts in evolution: a genomic strand is simply repeated again and again until the genome gets it right. What we need is a systemic breakthrough, an explanation of what basic physical and chemical processes drive this kind of evolution (prpagation, to borrow from chemist's terminology), and how this evolution arrives at a logical conclusion or purpose (termination). This would not only be a breakthrough for mathematics and information science, but it would finally answer deep factual and philosophical questions concerning Darwinism. I haven't read Kurzweil, but he probably anticipates what I said.

I hope genetic research in this country isn't run like Homeland Security. That scandal in Korea is no comfort.

Fly said at January 23, 2006 02:48 PM:

“So how do we make sense of the entire genetic database of all life on earth?”

Biological processes are highly conserved from bacteria, to yeast, to fruit flies, to chimpanzees, to humans. Understanding the basic biology of yeast will significantly advance understanding all life. There are currently many projects underway to model how networks of genes and proteins work together. For example:

The first genome-wide screen for protein complexes is completed.

http://www.eurekalert.org/pub_releases/2006-01/embl-tcl011906.php

“"To carry out their tasks, most proteins work in dynamic complexes that may contain dozens of molecules," says Giulio Superti-Furga, who launched the large-scale project at Cellzome four years ago. "If you think of the cell as a factory floor, up to now, we've known some of the components of a fraction of the machines. That has seriously limited what we know about how cells work. This study gives us a nearly complete parts list of all the machines, and it goes beyond that to tell us how they populate the cell and partition tasks among themselves."
The study combined a method of extracting complete protein complexes from cells (tandem affinity purification, developed in 2001 by Bertrand Séraphin at EMBL), mass spectrometry and bioinformatics to investigate the entire protein household of yeast, turning up 257 machines that had never been observed. It also revealed new components of nearly every complex already known.”

“Patrick Aloy and Rob Russell at EMBL used sophisticated computer techniques to reveal the modular organisation of these cellular machines. "This is the most complete set of protein complexes available and probably the set with the highest quality," Aloy says. "Most proteomics studies in the past have shown whether molecules interact or not, in a 'yes/no' way. The completeness of this data lets us see how likely any particular molecule is to bind to another. By combining such measurements for all the proteins in the cell, we discovered new complexes and revealed their modular nature."

"Investigating protein complexes has always posed a tricky problem – they're too small to be studied by microscopes, and generally too large to be studied by techniques like X-ray crystallography," says Russell. "But they play such a crucial role in the cell that we need to fill in this gap. There's still a huge amount to be learned from this data and from the methods we are developing to combine computational and biochemical investigations of the cell." “

jimcrack said at January 23, 2006 03:29 PM:

Fly:

That was very informative, but Do Not Tempt The Lord Your God. The "sense" I was referring to covers two unanswerable extremes. The first is how, or for what reason life originates at all. The second is how intelligent life exists, at least to the extent that "intelligence" exists to somehow apprehend a physical environment, and adapt to it.

I do not endorse the "intelligent design" nonsense that passes for science. But there is a tendency of modern scientists to reduce everything to "information science". That is satisfactory in some ways, such as Craig Venter's efforts to simplify and cut to the chase the research into the human genome. But the understanding of DNA is not ultimately based on a "library of genes". The concept of the gene itself is based on solid observations, and on some what was 50 years ago some unimaginable empiricism (ie. the fit of nuclides into a spiral, the covalent v. non-covalent nature of strand binding, the acceptance of RNA migration between the nucleus and mitochondria, etc.)

The 'protein complex' approach to biodynamics is very promising. But DNA is not a protein. So why do genomes exist at all? That was roughly the question when it was believed, before Watson and Crick, that RNA was the blueprint of life.

Randall Parker said at January 23, 2006 04:18 PM:

Anthony Kendall,

Well, one can certainly make the argument that the amount of useful information isn't increasing as fast as the data increases. Okay, suppose that is true. Still, there must be some sort of useful information doubling rate as well. I wonder what it is.

I suspect the useful information doubling rate is not constant. One has to collect many pieces of a information to solve an entire puzzle. The entire puzzle solution gives you the useful result. I expect we'll see huge turning points in biotechnology where, for example, we tried and tried and tried to cure cancer and suddenly, wham, it is totally curable. Ditto for growing replacement organs. We won't, we won't, we won't be able to do it. Then, wham, we will.

A lot of pieces of information have synergistic value. What the continued data doubling rate for DNA tells me is that we are going to be able to collect large numbers of useful pieces of information which we'll be able to use in synergistic fashion to solve a great many problems. The DNA doubling rate is a visible indicator of a larger phenomenon: The rapid automation of lab research. DNA isn't the only biochemical compound that is amenable to this sort of automation. So surely our ability to take apart and measure cells is advancing rapidly on a wide front.

K said at January 24, 2006 07:06 PM:

jimcrack: one of the most valuable uses of communication intercepts is almost never mentioned. Backtracking.

At some time a terrorist is caught or otherwise identified in the field. From the phone he was carrying or otherwise known to have used the records of that phone can then be backtracked. It can reveal when and where he phoned and perhaps who. The times can be tested for a pattern indicating a prearranged meeting. The phones he called can be analyzed to see who they called or were called by, etc.

As an extreme example a plan might be prepared to paralyze an entire network of terrorist phones at a critical time.

So electronic intercepts may be much later. And even if the conversations reveal nothing.

Of course backtracking can be defeated by discarding cell phones and changing patterns, etc. But that is expensive and complicated for the terrorists. And it is hard for them to know what has been compromised.

So the mere belief that your opponent is permitted to intercept communications opponent can hamper you.

jimcrack said at January 25, 2006 12:25 PM:

K:

So I hope you're not saying that after all this genetic research, the ultimate objective is to unplug all life on earth, or that the knowledge of life will be used cripple and lower the confidence of the advancement of life (sort of like the caterpillar, who when asked how he moved so many legs at once, got confused and tangled up when he tried to demonstrate his own technique)

K said at January 28, 2006 06:13 PM:

jimcrack: your last puzzles me. Earlier you had remarked about Homeland Security having more information than they can use. That is probably true. I just commented that information you don't use now may prove most useful later when an event gives you the starting clue.

Almost every article and remark I ever see objecting to electronic intercepts is based on the premise that a foreign language will be used, we can't translate everything, guarded or coded conversations will prevent interpretation, it will be too late, etc. All these are true but the pattern, numbers, and location can still be very useful later.

Whether it should be done is different question.

I mentioned nothing remotely connected to genetic research. Mine was just an aside to your lead sentence of 12:42pm Jan 23.

As far as I can read minds, tdean may be the one who thinks my ultimate objective is to unplug all life on earth (by using nuclear powerplants).

Post a comment
Comments:
Name:
Email Address:
URL:
Remember info?

       
Go Read More Posts On FuturePundit
Site Traffic Info
The contents of this site are copyright ©