Been meaning to comment on this report of a couple of weeks ago: We don't carry around anywhere near as much junk DNA as previously claimed. Not too many slacker letters in the genome going along for the ride.
"Our genome is simply alive with switches: millions of places that determine whether a gene is switched on or off," says Ewan Birney of EMBL-EBI, lead analysis coordinator for ENCODE. "The Human Genome Project showed that only 2% of the genome contains genes, the instructions to make proteins. With ENCODE, we can see that around 80% of the genome is actively doing something. We found that a much bigger part of the genome – a surprising amount, in fact – is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks."
This report has a number of important implications:
If we had cheaper genetic sequencing equipment 30 years ago we would have made very good use of the data because of the cost of computer memory, disk, and CPU. The amount of data we need to process thru to tease out what genes do is huge. We each have about 3 billion genetic letters. But when sequencing a genome it has to be sequenced many times to identify errors and to fit together overlapping pieces. So each genome's sequencing requires processing of tens of billions of genetic letters with lots comparisons and building up of data structures to gradually connect all the pieces together.
Once each genome's sequence is known then using it to compare against other genomes requires even more computer powers. The differences in those letters need to be compared with many attributes of each of us for tens or hundreds of millions of people in order to discover all their effects.