STANFORD, Calif. – In their exuberance over cracking the genetic code, scientists have paid too little attention to privacy issues, say researchers at the Stanford University School of Medicine. Their findings, published in the July 9 issue of Science, suggest that traditional means of ensuring confidentiality do not apply to genetic data and that additional safeguards are needed to protect patients from potential abuses.
“I am surprised that no one has looked at this problem before and asked, ‘Can we really release genome-wide information about individuals to the public,’” said Zhen Lin, a genetics graduate student who led the study. “Nobody did a careful calculation to find whether ‘anonymous’ patients could be identified from this data.”
Supposedly stripping out non-DNA information is enough to protect a patient's privacy.
A 1996 federal law that governs medical privacy requires that research data be stripped of identifying information such as names, addresses and even the last three digits of a patient’s ZIP code before it can be shared. But the law is essentially silent on the issue of DNA, and most researchers have interpreted this to mean that sharing sequence data linked to information from a patient’s medical history is safe.
Sift through a big chunk of published DNA sequence from a published research paper and you can dig out enough points to compare to a person's DNA gotten some other way to get a match.
“Traditionally people believe that if there is no identifier attached, then the sample is anonymous,” Lin said. “We found that’s really not true because the DNA code itself is an identifier.” To demonstrate this, the researchers looked at specific sites in DNA that commonly vary from person to person, accounting for many genetic differences. Each person has about 5 million such sites in their DNA. Using a statistical model, the researchers found that matching 100 of these sites would identify an individual to a high degree of certainty.
In theory, if a person collected a small amount of genetic information about a former research subject, he could match it to database material in the future to get personal medical information about the subject.
Very few people are research subjects on projects that publish a lot of DNA sequences. Most research projects that use DNA variations look at only a fairly small number of sections of DNA to find variations in particular genes thought to be involved in some disease or perhaps in producing differences in athletic performance or cognitive function. So this concern about published DNA sequences is not important (at least not yet) for most research trial subjects who get their DNA tested in some manner.
For research efforts that sequence larger chunks of DNA sequences per test subject more safeguards may be needed for handling the data. However, at this point the cost of personal DNA sequencing is so high that few people are getting so much of their total genome sequenced by researchers to allow for identification of each person. It is not enough simply to have knowledge of 100 out of the 5 million sites which vary between people. Those sites have to be scattered across enough different chromosomes to provide coverage of all (or nearly all) of the chromosomes. My guess is that most research projects that are sequencing large sections are doing so on a limited number of chromosomes and so are not providing the kind of data needed to enable unique identification of each research subject.
In the long run I believe genetic privacy is going to become impossible to protect. So I'm pretty lackadaiscal about this whole subject anyhow.
|Share |||Randall Parker, 2004 July 31 10:35 PM Biotech Privacy|