January 15, 2011
Time For Million Genomes Sequencing Project
Razib points to a debate about how fast and how far DNA sequencing costs will drop. John Hawks expects $50 for full genome sequencing in less than 5 years.
The inevitability of the $1000 genome has already made it irrelevant. We should expect a $1000 genome announcement this year. This will be hype, because the real $1000 genomes won't be here until...next year! Before the end of 2014, whole genome sequences at 4x coverage will cross the $100 mark. I think there's a good chance they will be less than $50 at that time.
Based on numbers I've seen, those numbers are around six months optimistic. Geneticists are already planning projects anticipating $100 genomes -- some suggest that the next big project should be a "Million Genomes", because there isn't any sense bothering with a hundred thousand.
My take: once we have really cheap prices for the sequencing scientists should stop trying to get funding for the sequencing. Just ask people to get their own genomes sequenced and submitted to the scientists for their research. People who submit a sample for genetic sequencing should be able to check boxes on a web page to specify which research projects should be able to get their genetic sequencing info (and this could be done for any medical tests we order for ourselves).
Bottom line: It is time for massive medical research projects that are organized virtually.
An enterprising private medical research foundation (the Howard Hughes Medical Institute comes to mind) could fund the development of a web site where people could upload their genetic testing and sequencing results along with lots of measurement and medical history data about them. People could enroll themselves in a massive web-based research project on genetics, diet, lifestyle, drugs, diseases, health problems, physical accomplishments, intellectual achievements, and other aspects of their lives. For example, people could use digital cameras to upload pictures of their eyes, face and body at different angles, the inside of their mouth, and other views. People could enter in basic info such as sex, birth date, weight, and other info as well as go thru forms that ask lots of health history questions.
People could also enroll into such a research project thru their doctor. They could elect to have all their medical tests, diagnoses, and drug prescriptions provided to the research project. Drug store chains could provide the option of having drug purchase histories uploaded to the massive medical history database built from all the data that flows into it from the web doctors' offices, medical testing labs, and other data sources.
Randall, sometimes your ingenuousness astounds me. We already live in something approaching a police state here in our purported constitutional representative republic. And yet the mayor of a town in Maryland's house is mistakenly broken into via a no knock raid and his dogs killed with impunity. Ad infinitum.
People should get universal ID cards, embedded rfd chips or externally visible bardcodes so that all info about them can be tracked 24/7.
People should not be upset at government or corporate intrusion into their private lives.
People should recognized that their superior heirarchical organizations are there to improve their lives and take good care of them.
People could come to love their totalitarian police state which controls their every decision in life from conception to death.
Sorry Randal. I will NOT be getting my genes sequenced, or doing ANY of those jolly things you suggest.
"But, but.. (I'm channeling a researcher here.) if the people we get sequences on are self-chosen, they won't be a random selection of sequences!"
Biobob, I can imagine bad uses of this data. OTOH, if medical progress doesn't pick up the pace a LOT, I'm not going to have to worry about those bad uses, 'cause I'll be dead.
I can tell you for sure - If electronic information CAN POSSIBLY be abused, it WILL be abused. The more useful such information is, the more abuse it will see. Period. Warrantless searches are the rule these days, not the exception - and that's just the legal ones. Just forget about the illegal ones or information disclosure thru incompetence; those are merely common.
I think the opposite of Brett's point applies. There appears to be some doubt being cast upon a lot of the "gene X involved in the causing Y" results because the statistics are very weak (the smaller your set (in this case of Y sufferers), the greater the chance that they'll have SOMETHING in common purely by chance), and this is with researchers using support groups for Y in order to mailshot candidates for genetic testing. I suspect that, given that at this time there's little personal advantage to genetic sequencing and certain downsides (such as having to reveal the results to insurers), I think a "we got ourselves genetically tested" dataset is simply going to be far too diffuse to actually do meaningful statistics, and hence science. I certainly won't be going out of my way to get tested until there's some substantial advance in effective use of genetic sequence information.
Given genetic sequences from a few million people along with lots of details about their health history, assorted cognitive tests, personality tests, blood tests, and lots of other measurements I expect the influences of a great many genes on a great many phenotypic expressions will be identified.
Effective use of genetic sequence information goes up with the number of people sequenced.
The point is that, say, 1 million semi-randomly selected people being sequenced per year isn't likely to throw up that many examples of an one condition, particularly difficult to categorise measurements like personality tests. Trying to find some stats, prostate cancer is apparently one of the most common cancers in men in the UK, and apparently there are 97 new cases per 100000 people. So ignoring stats on women and prostate cancer and assuming 50:50 sequencing split and assuming no relationship between any SPECIFIC condition and being sequenced, you'd expect to get roughly 500000*97/100000=485 new DNA sequences of those with prostate cancer. (There's clearly going to be a correlation between being ill with SOMETHING and being sequenced, so this is probably a slight under-estimate.) I'd assume that for diseases with multiple genetic factors (ie, not every one will have the the same deleterous genetic markers) that's on the borderline of what will give convincing statistics. (See above comment about how many previous news stories are currently under doubt because of sample sizes.) Now this is for a relatively binary diagnosis of a very common problem: inferring about more nebulous quantities on such a the small datasets you're going to get on the frequency of "condition" by 1 million and I'm very unsure RELIABLE data mining can be done. If you were talking about, say, a billion or more people then I'd say it would work. But the time to reach a billion generic people might be, even with accelerating smampling, 20 years and this may mean targetted sampling is still the best way to make progress for the next 10 years or so.
Of course I'm not a bio-informatician, just someone who uses statistics, so I may be missing something that makes the calculations more favourable.
A quick web search throws up, for example
which despite using 17000 cases (and 39369 controls, but finding enough controls isn't going to be a problem) didn't find a statistically significant confirmation of a previously reported gene marker.
I think it's important for research like this to actually work the numbers rather than go for the "heurisitic overview", which is certainly an attractive prospect.
Well said embryonic. I agree that the largest problem with most of these genetic pie in the sky hypotheses eg DNA aging, DNA//mDNA evolutionary relationships with various fingerbones, etc is absurd sample size and variance assumptions considering the conclusions drawn.
$50 genomes are not going to happen. Ever. Merely basic reagents needed to make sequencing ready DNA cost about twice the price. Then there are time, personnel and equipment costs. Think tomography imaging as an analogy. It still costs $$$.
Tests like CT and MRI have high prices, but low costs. How much labor can there be for a test that takes 5 minutes?