Data Sharing vs. Privacy: Cutting the Gordian Knot (Re-Identification Symposium)

PGP participants and staff at the 2013 GET Conference. Photo credit: PersonalGenomes.org, license CC-BY

This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. You can call up all of the symposium contributions here. Please note that Bill of Health continues to have problems receiving some comments. If you post a comment to any symposium piece and do not see it within half an hour or so, please email your comment to me at mmeyer @ law.harvard.edu and I will post it. —MM

By Madeleine Ball

Scientists should share. Methods, samples, and data — sharing these is a foundational aspect of the scientific method. Sharing enables researchers to replicate, validate, and build upon the work of colleagues. As Isaac Newton famously wrote: “If I have seen further it is by standing on the shoulders of giants.”

When scientists study humans, however, this impulse to share runs into another motivating force — respect for individual privacy. Clinical research has traditionally been conducted using de-identified data, and participants have been assured privacy. As digital information and computational methods have increased the ability to re-identify participants, researchers have become correspondingly more restrictive with sharing. Solutions are proposed in an attempt to maximize research value while protecting privacy, but these can fail — and, as Gymrek et al. have recently confirmed, biological materials themselves contain highly identifying information through their genetic material alone.

When George Church proposed the Personal Genome Project in 2005, he recognized this inherent tension between privacy and data sharing. He proposed an extreme solution: cutting the Gordian knot by removing assurances of privacy:

If the study subjects are consented with the promise of permanent confidentiality of their records, then the exposure of their data could result in psychological trauma to the participants and loss of public trust in the project. On the other hand, if subjects are recruited and consented based on expectation of full public data release, then the above risks to the subjects and the project can be avoided.

Church, GM “The Personal Genome Project” Molecular Systems Biology (2005)

Thus, the first ten PGP participants — the PGP-10 — identified themselves publicly.

Read More

Breaking Good: A Short Ethical Manifesto for the Privacy Researcher

This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. We’ll have more contributions throughout the week, and extending at least into early next week. Background on the symposium is here. You can call up all of the symposium contributions here (or by clicking on the “Re-Identification Symposium” category link at the bottom of any symposium post).

Please note that Bill of Health continues to have problems receiving some comments. If you post a comment to any symposium piece and do not see it within half an hour or so, please email your comment to me at mmeyer @ law.harvard.edu and I will post it. —MM

By Yaniv Erlich

1. Increase the general knowledge –Like any other scientific discipline, privacy research strives to increase our knowledge about the world. You are breaking bad if your actions are aimed to reveal intimate details of people, or worst to exploit these details for your own benefit. This is not science. This is just ugly behavior. Ethical privacy research aims to deduce technical commonalities about vulnerabilities in systems not about the individuals in these systems. This should be your internal compass.

This rule immediately asserts that your published findings should communicate only relevant information to deduce general rules. Any shocking/juicy/intimate detail that was revealed during your study is not relevant and should not be included in your publication.

Some people might gently (or aggressively) suggest that you should not publish your findings at all. Do not get too nervous by that. Simply remind them that the ethical ground of your actions is increasing the general knowledge. Therefore, communicating your algorithms, hacks, and recipes is an ethical obligation and without that your actions cannot be truly regarded as research. “There is no ignorabimus … whatever in natural science. We must know — we will know!”, the great Mathematician David Hilbert once said. His statement applies also to privacy research.

Read More

Re-Identification Is Not the Problem. The Delusion of De-Identification Is. (Re-Identification Symposium)

By Michelle Meyer

This is the second post in Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. We’ll have more contributions throughout the week, and extending at least into early next week. Background on the symposium is here. You can call up all of the symposium contributions by clicking here (or by clicking on the “Re-Identification Symposium” category link at the bottom of any symposium post).

Please note that Bill of Health continues to have problems receiving some comments. If you post a comment to any symposium piece and do not see it within half an hour or so, please email your comment to me at mmeyer @ law.harvard.edu and I will post it. —MM

By Jen Wagner, J.D., Ph.D.

Before I actually discuss my thoughts on the re-identification demonstrations, I think it would be useful to provide a brief background on my perspective.

Identification≠identity

My genome is an identifier. It can be used in lieu of my name, my visible appearance, or my fingerprints to describe me sufficiently for legal purposes (e.g. a “Jane Doe” search or arrest warrant specifying my genomic sequence). Nevertheless, my genome is not me. It is not the gist of who I am –past, present or future. In other words, I do not believe in genetic essentialism.

My genome is not my identity, though it contributes to my identity in varying ways (directly and indirectly; consciously and subconsciously; discretely and continuously). Not every individual defines his/her self the way I do. There are genomophobes who may shape their identity in the absence of their genomic information and even in denial of and/or contradiction to their genomic information. Likewise, there are genomophiles who may shape their identity with considerable emphasis on their genomic information, in the absence of non-genetic information and even in denial of and/or contradiction to their non-genetic information (such as genealogies and origin beliefs).

My genome can tell you probabilistic information about me, such as my superficial appearance, health conditions, and ancestry. But it won’t tell you how my phenotypes have developed over my lifetime or how they may have been altered (e.g. the health benefits I noticed when I became vegetarian, the scar I earned when I was a kid, or the dyes used to hide the grey hairs that seem proportional to time spent on the academic job market). I do not believe in genetic determinism. My genomic data is of little research value without me (i.e. a willing, able, and honest participant), my phenotypic information (e.g. anthropometric data and health status), and my environmental information (e.g. data about my residence, community, life exposures, etc). Quite simply, I make my genomic data valuable.

As a PGP participant, I did not detach my name from the genetic data I uploaded into my profile. In many ways, I feel that the value of my data is maximized and the integrity of my data is better ensured when my data is humanized.

Read More

Kudos to This American Life

By Michelle Meyer

A few weeks ago, I blogged about a recent episode of This American Life, “Dr. Gilmer and Mr. Hyde,” about the quest of one Dr. Gilmer (Benjamin) to understand why another, beloved Dr. Gilmer (Vince), had brutally murdered his own father after hearing voices that compelled him to do so. The episode ends (spoiler alert) with the revelation that Vince suffers from Huntington’s, a rare, neurodegenerative disease that causes progressive physicial, cognitive, and psychological deterioration.

Listeners, it seemed to me, could naturally conclude from the episode that it was Vince’s Huntington’s that had caused him to murder his father. That might or might not be true in this particular case. Huntington’s can cause behavioral and mood changes, including irritability, aggression and belligerence. It can also cause (less often) psychosis. But even if Huntington’s caused Vince to murder his father, or somehow contributed to the murder, the extreme violence that Vince displayed — strangling his father, then sawing off his father’s fingertips to preclude identification — is in no way typical of the Huntington’s population as a whole. And so what most troubled me about the episode was its failure to note just how rare this kind of extreme violence is among those with Huntington’s, just as it is very rare among human beings generally. And so I wrote to TAL, requesting a clarification.

I’m happy to report that the TAL producer for the episode, Sarah Koenig — who had not intended to suggest any causal link between Vince’s murder of his father and his Huntington’s, much less between murder and Huntington’s more generally — has issued a clarification on the show’s blog, and promises to make a similar clarification in the episode itself, should they ever re-air it. Kudos to TAL, and many thanks to Sarah for being incredibly gracious in our exchanges.

One clarification deserves another. In my earlier blog post, I also worried that some listeners might  conclude that Vince’s father was similarly driven to commit horrific acts of sexual abuse on Vince and his sister because he, too, was (presumably) suffering from Huntington’s (an autosomal dominant genetic disease). Although I think that a listener who didn’t know better could reasonably conclude that Huntington’s causes people to become sexual predators almost as easily as they could conclude from the episode that Huntington’s causes people to become murderers, nothing in the episode suggests that Sarah, Benjamin Gilmer, or anyone else at TAL believe that Huntington’s causes sexual abuse, or that they intended for listeners to reach that conclusion. I regret anything in my earlier post that suggested otherwise.

Again, I’m very grateful to Sarah and everyone else at TAL for hearing me (and other listeners) out and for agreeing to make the clarification — and just in time for HD Awareness Month!

Online Symposium on the Law, Ethics & Science of Re-identification Demonstrations

By Michelle Meyer

Over the course of the last fifteen or so years, the belief that “de-identification” of personally identifiable information preserves the anonymity of those individuals has been repeatedly called up short by scholars and journalists. It would be difficult to overstate the importance, for privacy law and policy, of the early work of “re-identification scholars,” as I’ll call them. In the mid-1990s, the Massachusetts Group Insurance Commission (GIC) released data on individual hospital visits by state employees in order to aid important research. As Massachusetts Governor Bill Weld assured employees, their data had been “anonymized,” with all obvious identifiers, such as name, address, and Social Security number, removed. But Latanya Sweeney, then an MIT graduate student, wasn’t buying it. When, in 1996, Weld collapsed at a local event and was admitted to the hospital, she set out to show that she could re-identify his GIC entry. For twenty dollars, she purchased the full roll of Cambridge voter-registration records, and by linking the two data sets, which individually were innocuous enough, she was able to re-identify his GIC entry. As privacy law scholar Paul Ohm put it, “In a theatrical flourish, Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.”

Sweeney’s demonstration led to important changes in privacy law, especially under HIPAA. But that demonstration was just the beginning. In 2006, the New York Times was able to re-identify one individual (and only one individual)  in a publicly available research dataset of the three-month AOL search history of over 600,000 users. The Times demonstration led to a class-action lawsuit (which settled out of court), an FTC complaint, and soul-searching in Congress. That same year, Netflix began a three-year contest, offering a $1 million prize to whomever could most improve the algorithm by which the company predicts how much a particular user will enjoy a particular movie. To enable the contest, Netflix made publicly available a dataset of the movie ratings of 500,000 of its customers, whose names it replaced with numerical identifiers. In a 2008 paper, Arvind Narayanan, then a graduate student at UT-Austin, along with his advisor, showed that by linking the “anonymized” Netflix prize dataset to the Internet Movie Database (IMDb), in which viewers review movies, often under their own names, many Netflix users could be re-identified, revealing information that was suggestive of their political preferences and other potentially sensitive information. (Remarkably, notwithstanding the re-identification demonstration, after awarding the prize in 2009 to a team from AT&T, in 2010, Netflix announced plans for a second contest, which it cancelled only after tussling with a class-action lawsuit (again, settled out of court) and the FTC.) Earlier this year, Yaniv Erlich and colleagues, using a novel technique involving surnames and the Y chromosome, re-identified five men who had participated in the 1000 Genomes Project — an international consortium to place, in an open online database, the sequenced genomes of (as it turns out, 2500) “unidentified” people — who had also participated in a study of Mormon families in Utah.

Most recently, Sweeney and colleagues re-identified participants in Harvard’s Personal Genome Project (PGP), who are warned of this risk, using the same technique she used to re-identify Weld in 1997. As a scholar of research ethics and regulation — and also a PGP participant — this latest demonstration piqued my interest. Although much has been said about the appropriate legal and policy responses to these demonstrations (my own thoughts are here), there has been very little discussion about the legal and ethical aspects of the demonstrations themselves. As a modest step in filling that gap, I’m pleased to announce an online symposium, to take place here at the Bill of Health the week of May 20th, that will address both the scientific and policy value of these demonstrations and the legal and ethical issues they raise. Participants fill diverse stakeholder roles (data holder, data provider — i.e., research participant, re-identification researcher, privacy scholar, research ethicist) and will, I expect, have a range of perspectives on these questions:

Misha Angrist
Madeleine Ball

Daniel Barth-Jones

Yaniv Erlich

Beau Gunderson

Stephen Wilson

Michelle Meyer

Arvind Narayanan

Paul Ohm

Latanya Sweeney

Jennifer Wagner

I hope readers will join us on May 20.

UPDATE: You can call up all of the symposium contributions, in reverse chronological order, by clicking here.

Privacy and Progress and the Deidentification of Whole Genome Sequence Data

[Posted on behalf of Elizabeth Pike and Kayte Spector-Bagdady from the Presidential Commission for the Study of Bioethical Issues – and cross-posted here.]

In the most recent issue of the Hastings Center Report, Drs. Amy Gutmann and James Wagner of the Presidential Commission for the Study of Bioethical Issues (the Bioethics Commission), contributed to the lively debate surrounding the identifiability of genetic data. In Found Your DNA on the Web: Reconciling Privacy and Progress, Gutmann and Wagner, Chair and Vice-chair respectively, argue that the paradigm of identifiability has become less relevant to individual privacy protections than restrictions on access and use.

In their commentary, Gutmann and Wagner continue the public deliberation of the Bioethics Commission’s report, Privacy and Progress in Whole Genome Sequencing, in which the Bioethics Commission took a forward-looking approach to the privacy concerns raised by whole genome sequencing—issues that have come to the forefront of this important science.

Under current law, health information that is deidentified—information for which there is “no reasonable basis” to believe it can identify an individual or that has been stripped of traditional identifiers—is afforded different legal protections than identifiable health information. However, whole genome sequence data are unique to only one person, making them more vulnerable to reidentification.

Recent articles have cast doubt on the extent to which whole genome sequence data can be deidentified. For example, in Identifying Personal Genomes by Surname Inference, published in Science in January, Melissa Gymrek, et. al. successfully uncovered full identities of 50 individuals.

Read More

Live Blogging from FDA in the 21st Century Conference, Plenary 2: Alta Charo on Integrating Speed and Safety

By Michelle Meyer

[This is off-the-cuff live blogging, so apologies for any errors, typos, etc]

Day two of PFC’s FDA in the 21st Century conference begins with a morning plenary by the very fabulous Alta Charo, of the University of Wisconsin Law School, who is speaking on “Integrating Speed and Safety.”

Today Alta is presenting what she calls “more of an initial idea than an actual proposal,” and she notes that she’s very interested to hear responses to it, so comment away or contact her offline. She wants to integrate into the usual and longstanding “FDA speed versus safety” debate some concerns that should be of interest to industry. “In other words,” she said, “I’d like to be nice to the drug people.”

Alta begins with a brief history of the speed versus safety debate, which turns out to be quite cyclical. Before 1906, she asks us to recall, we had true snake oil: products with high toxicity and little or no efficacy. Often these products were nevertheless perceived as effective because they contained alcohol or other drugs, so made you feel better at least, but of course that’s part of what made these products so dangerous, especially for children.

And so with the Federal Food and Drugs Act of 1906, we get post-market remedies for misbranding, although they require proof of intent. And then in 1937 over 100 children die from elixir of sulfanilamide. And the following year we get the Food, Drug, and Cosmetic Act. But the FDCA targets only safety. (Although rightly Alta notes that it’s hard to see how regulators were truly only looking at safety and not also at some form of efficacy, since there is no such thing as safety in the abstract, only safety relative to purpose for which someone is taking the drug.) Read More

Why North Dakota’s Ban on Genetic Selection Matters (Online Abortion and Reproductive Technology Symposium)

[Ed Note: Posted on behalf of Jaime King]

On March 26, 2013, North Dakota Governor, Jack Dalrymple, signed into law two of the nation’s most restrictive abortion bills. The first, HB 1456, prohibits providers from performing an abortion once a fetal heartbeat can be detected, which can be as early as six weeks gestation (Fetal Heartbeat Ban).  The second, HB 1305, prohibits providers from knowingly performing abortions sought solely because of the sex of the fetus or because the fetus has been diagnosed with a genetic abnormality or the potential for a genetic abnormality (Sex and Genetic Selection Ban).

Much of the press coverage and discussion of these unprecedented laws has focused on the Fetal Heartbeat Ban. This is largely because the prohibition eliminates nearly all access to abortion in the state and poses a direct challenge to a woman’s right to choose to have a pre-viability abortion free from undue state interference, as delineated in Planned Parenthood v. Casey.   Viability has typically been established around 24 weeks gestation, which is generally considered the end of the second trimester. The sweeping nature of this prohibition essentially negates the impact of a prohibition on sex or genetic selective abortions, as testing for those conditions, even with non-invasive prenatal testing techniques, cannot be performed reliably prior to nine or ten weeks gestation. By that point, the Fetal Heartbeat Ban would already prohibit any form of selective abortion.

But we should not ignore this law, as it is the more insidious of the two. As a direct threat to abortion access for all women, the Fetal Heartbeat Ban is very likely to be found unconstitutional, short of a complete overturning of Roe v. Wade. The Sex and Genetic Selection Ban, however, is subject to more debate. Since Roe, we have largely assumed that women can have an abortion for any reason prior to viability, but the courts have never directly addressed the issue.  Recent polls have found that over 3/4 of Americans would support bans on sex selective abortions,[1] and five states have already passed sex selection bans.[2]  The question of whether a woman’s reason matters is upon us.

Opening the door to permit states to invade and assess women’s private thoughts regarding her reasons for having an abortion strikes directly at the heart of the reproductive liberties protected by the Fourteenth Amendment. If states can regulate access to abortion based on a woman’s reasons for having it, they can significantly limit access in a piecemeal fashion – slowly and deliberately circling in on the right. Read More

Impact of the “Lander Brief” in the Myriad Case – and an answer to Justice Alito’s Question

 [Cross posted at Prawfsblawg.com]

The Supreme Court heard oral arguments on April 15 in Association of Molecular Pathology et al. v Myriad, concerning whether human genes are patent-eligible subject matter. The case focused on Myriad’s patents on two genes, BRCA1 and BRCA2, involved in early-onset breast cancer.

Surprisingly, many of the Court’s questions for Myriad’s counsel focused on what Justice Breyer dubbed the “Lander Brief” – an amicus filed on behalf of neither party by one of the country’s leading scientists, Dr. Eric Lander. (Lander was one of the leaders of the Human Genome Project and co-chair’s the Presidents Council of Advisors on Science and Technology.) [Full Disclosure: I am one of the authors of this brief.] Justices Breyer, Ginsburg and Alito referred to the brief by name, and several other Justices were clearly influenced by the information in the brief.

I believe that the “Lander brief” was a hot topic of conversation because the Justices realized that it was central to applying the Court’s product-of-nature doctrine to DNA. Importantly, the brief demolished the scientific foundation of the Federal Circuit decision on appeal. The Federal Circuit panel held that human chromosomes are not patent-eligible because they are products of nature, but a majority found that “isolated DNA” fragments of human chromosomes (such as pieces of the breast cancer genes) are patent-eligible. The Federal Circuit’s distinction rested on its assumption that (unlike whole chromosomes) isolated DNA fragments do not themselves occur in nature, but instead only exist by virtue of the hand of man. Read More

Will Your Law Firm (or Other Employer) Pay for Your Egg Freezing? Should It? (Online Abortion and Reproductive Technology Symposium)

As John Robertson mentioned in his post earlier this week, in order to avoid age-related infertility many women are considering or will soon consider using egg freezing, as the technology has dramatically improved. As compared to freezing preembryos, for example, this is an attractive option since many of these women (heterosexual or otherwise) may not yet have chosen a reproductive partner, and also may want to hedge their bets to have options should they divorce. Still, the technology is not cheap.

At least one participant at the the bricks-in-mortar symposium reported to me that they knew of one Am Law 100 firm that will cover egg freezing for its lawyers. I would be grateful if folks in the comments section could indicate whether they knew whether their firm covers it as well. [Ed. Note: If you have any trouble with the comment function on the blog, which is still giving us trouble, send a note and we’ll get it posted for you through the admin account.] My own impression is that this is not yet widespread, but that might change as the practice becomes more common and thus the market converges (perhaps with a push from Above the Law).

Should law firms cover egg freezingt? I have made the argument elsewhere for coverage of reproductive technologies by insurance more generally from a moral and economic perspective. In the case of law firms, I am curious about the PR implications for the firm. Would potential female associates welcome this option knowing that they can work hard early on and still reproduce, if they so desire, later on? Or would they take this as a signal that the firm thinks that working there as an associate and pregnancy are incompatible? Would this option help remedy the deficits faced by women who want to have children on the partnership track or would it in fact exacerbate discrimination against women who do choose to have families early on while at the firm, with the thinking being “she could have waited.” More generally, would this be a blow for or against gender equity at law firms?