I Never Promised You a Walled Garden (Re-Identification Symposium)

This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. You can call up all of the symposium contributions here. We’ll continue to post contributions into next week. —MM

By Misha Angrist

Dear Michelle:

You know I respect your work immensely: your paper on the heterogeneity problem will be required reading in my classes for a long time to come.

But as far as this forum goes, I feel like I need both to push back and seek clarity. I’m missing something.

As you know, the PGP consent form includes a litany of risks that accompany the decision to make one’s genome and medical information public with no promises of privacy and confidentiality. These risks range from the well documented (discovery of non-paternity) to the arguably more whimsical (“relatedness to criminals or other notorious figures.”), including the prospect of being cloned. You write:

Surely the fact that I acknowledge that it is possible that someone will use my DNA sequence to clone me (not currently illegal under federal law, by the way) does not mean that I have given permission to be cloned, that I have waived my right to object to being cloned, or that I should be expected to be blasé or even happy if and when I am cloned.

Of course not. No one is asking you to be silent, blasé or happy about being cloned (your clone, however, tells me she is “totally psyched”).

But I don’t think it’s unfair to ask that you not be surprised that PGP participants were re-identified, given the very raison d’être of the PGP.

I would argue that the PGP consent process is an iterative, evolving one that still manages to crush HapMap and 1000 Genomes, et al., w/r/t truth in advertising (as far as I know, no other large-scale human “subjects” research study includes an exam). That said, the PGP approach to consent is far from perfect and, given the inherent limitations of informed consent, never will be perfect.

But setting that aside, do you really feel like you’ve been sold a bill of goods? Your deep–and maybe sui generis–understanding of the history of de-identification demonstrations makes me wonder how you could have been shocked or even surprised by the findings of the Sweeney PGP paper.

And yet you were. As your friend and as a member of the PersonalGenomes.org Board of Directors, this troubles and saddens me. In the iterative and collaborative spirit that the Project tries to live by, I look forward to hearing about how the PGP might do better in the future.

In the meantime, I can’t help but wonder: Knowing what you know and having done your own personal cost-benefit analysis, why not quit the PGP? Why incur the risk?

Warm regards,


0 thoughts to “I Never Promised You a Walled Garden (Re-Identification Symposium)”

  1. Misha,

    Thanks for raising this important point.

    The datasets on the PGP website are given under Create Commons Zero (CC0). The CC0 license waives: “rights that you have over your work, such as … your publicity or privacy rights, … and rights protecting the extraction, dissemination and reuse of data”.

    I agree with your argument that the PGP never promised any immunity against privacy breaches and in fact explicitly raised this possibility as part of the consent process.

    I would even take your argument one step further. Michelle’s was not only consent for the possibility of harmful secondary use. In fact, under CC0, Michelle ACTIVELY gave permission for any secondary use that complies with the law. End of story. The whole CC system is based on these ideas.

    BTW – George Church pushes the idea that PGP cell lines should be used for iPS research. iPS research –by any means– equals cloning.

    The PGP is usually used as an example of how ‘ethical’ human omics research should be conducted. In my view, the outrage to Latanya’s experiments is of a concern. If we cannot make it work here, how can we really advertise for other open access studies?

    1. Yaniv,
      Would you care to comment now on your own work re-identifying people in the 1000 Genomes dataset? Unlike PGP, there was a clear expectation that 1000 Genomes DNA donors would likely remain confidential; the consent said re-identifying was possible but “very hard”.
      You describe yourself as a “white hat hacker”. All hackers break rules; black hat hackers do so for illicit ends and white hat hackers claim a noble purpose. You’re implicitly claiming that the ends justify your means. That might well be the case. I so agree your results are important … but I don’t think you’ve made a full account of the costs of your methods, including the fact that attaching names to anonymous people is a form of indirect collection of PII by a third party with no relationship at all to the primary purpose for collection. In the UK (where the 1000 Genomes donations took place) this indirect collection without consent would likely be unlawful.
      I don’t think the debate in this symposium has yet got to the useful stage of weighing pros and cons to come up with an ethics calculation.

      1. Steve,
        “All hackers break rules” – not sure what is the basis for this strong sentence. I don’t agree with it. I worked as a White hat hacker for several years in a computer security company. Our customer were quite happy with our service!

        The CEU data of the 1000Genomes was collected from US people on a US territory and our identification experiments were done on a US land. EU rules do not apply in Massachusetts, since the revolution.

        Finally, the lawyers and bioethicists of University of Utah, NHGRI, 1000Genomes project, anonymous Science reviewers, MIT IRB, and Amy McGuire that was a co-author, did not raise any legal/ethical flag with our experiments. Are you sure your interpretation is correct?

        1. Apologies Yaniv, I thought the 1000 Genomes donors were in Britain. So okay, you are only subject to US laws.

          I still think you have an ethical challenge, if widespread privacy principles in other parts of the world find re-identification problematic. And they do: see how European regulators treated Facebook’s use of facial recognition to generate tag suggestions without consent. I think that process closely parallels “DNA hacking”: it takes anonymous data (photos or genomes), matches it against a reference (named biometric template or named gene sequences) and creates a statistical identification of the data. I am not a lawyer but my extensive privacy work alongside lawyers gives me grounds to suggest that DNA hacking by your methods in Europe would be unlawful. Are there lawyers watching this symposium that can offer an opinion?

          With respect to your ethics committees and reviewers, I find repeatedly that the implications of OECD collection and use limitation principles are not obvious to informaticians. I am not surprised that people haven’t twigged to the problem. Even Facebook, with all their resources and smarts, were oblivious to the OECD laws that caused their face recognition activities to be shut down.

          Finally on white hat hacking, I am sure your previous customers were happy. But I’ve managed these activities myself and I know the sponsors and contractors usually execute special agreements. In Australia, absent a special agreement, hacking into a computer system can constitute a cyber crime. My point is I believe that most white hat hackers know they’re pushing envelopes. As such, I would have thought you would have some sense that your hacking is not 100% innocent.

          But I am sorry if I’m wrong about your sense of the word “hacking”.

  2. I’ll reply to Misha when I get the chance, but I wanted to reply quickly to Yaniv. As I understand it, CC0 is an agreement between you, as a user of the PGP database, and the PGP as the owner of the same. It does not affect the rights of others, in particular:

    “Creative Commons licenses do not waive or otherwise affect rights of privacy or publicity to the extent they apply. If you have created a work or wish to use a work that might in some way implicate these rights, you may need to obtain permission from the individuals whose rights may be affected.”


    Cyberlaw is not my area of expertise, by any stretch, so I hope others will correct me if I’m missing something here.

    On the matter of cloning, most people distinguish between therapeutic and reproductive cloning. As I recall, I gave explicit consent to have cell lines created from my samples. Fine by me. I most certainly did not give explicit (or “active,” as you say) consent to the use of SCNT to create a delayed identical twin of myself (i.e., reproductive cloning) or to being re-identified. Whether the use of my biospecimen and/or genomic information to clone me (not just create a cell line) would violate any of my legal rights is an interesting question, perhaps for another day. Same with whether re-identifying me violates any legal rights. But if they do not (which may well be the case), it is certainly not because I explicitly consented either to being cloned or to be re-identified. I was explicitly *warned* about the *possibility* that these things are “possible,” but, as I argued at length, assumption of the risk simply does not constitute a grant of permission.

    I do wholeheartedly agree with both of you that the PGP did not promise that I would not be re-identified; quite to the contrary, it warned me of this risk, as well it should have. But I never argued that the PGP did make such a promise. I simply argued that the fact that the PGP warned participants that it could not protect against third-party re-identification did not, by itself, give any third-party the right vis-a-vis participants to re-identify them. I’ll grant you that under the CC0 you have a right vis-a-vis the PGP to do pretty much whatever you want with the data. But that’s not what I was talking about.

    Finally, I don’t think it’s a fair characterization of my post to imply that it expresses “outrage.” Outrage is not an argument, and I try very hard to trade in arguments, hopefully constructive ones.

    1. Here‘s a better link on the point about CC0 and third-party privacy, publicity (and other) rights, which provides, in relevant part (emphasis in original):

      Does CC0 really eliminate all copyright and related rights, everywhere?

      Please don’t take the 0 (zero) in the name “CC0” literally – no legal instrument can ever eliminate all copyright interests in a work in every jurisdiction.

      CC0 doesn’t affect two very important categories of copyright and related rights. First, just like our licenses, CC0 does not affect other persons’s rights in the work or in how it is used, such as publicity or privacy rights.

  3. Posted on behalf of Steve Wilson:

    Misha, do you believe that setting out the risks of third parties identifying participants gives those third parties the automatic right to identify participants? Without any further consent, or without limitation?

    I believe third party researchers have some duty of care to the participants. To measure that duty of care, in order that we can frame and properly debate the cost-benefit of research, I look to long standing OECD data protection principles. These set out obligations on collectors of PII regardless of where that PII came from.

  4. Michelle,

    I see your point. There are two issues:

    1. As a third party user of the PGP website, there is a confusion in the website whether the CC0 was granted by PGP participants, which in this case ACTIVELY waive all their rights in the data. Or the CC0 was granted by PGP and participants still have some privacy rights. How do I even know that was a third party??
    Maybe Misha or John Willbanks can answer the question.

    2. You say that you did not explicitly give permission for cloning or reID. Fair enough. But you also did not give permission for a wide range of “regular” science activities, say human evolution. Now, according to this argument, a PGP member might come tomorrow and complain that human evolution studies are unethical according to his religious view. Is he right because he never gave any explicit permission?

  5. Hi Yaniv,

    (1) The consent document doesn’t mention CC0 (and I don’t think that participants can be deemed to have offered a CC0 license via a website they have nothing to do with). The most relevant portion of the consent document is § 9.1:

    9.1 Promoting Open Access. One of the primary goals of the PGP is to develop a public dataset of information from willing participants to aid in the development of analytical tools and interfaces for scientists, doctors, and individuals around the world. In order to accomplish this, the PGP will endeavor to develop data structures and other tools, including legal agreements, to maximize the ability of the study to share its data in the broadest possible fashion.

    (2) This leads into your second, very good point, about human evolution (or other controversial studies) that could be done on PGP data under participants’ blanket consent. I entirely agree that if you wanted to use the PGP data to study, say, sex or race and IQ, then even though PGP participants didn’t explicitly consent to that particular study, they gave blanket consent to the open use of their data.

    My sense, though — and it’s really only that at this point — is that there’s a difference between using the data I give you, even in an offensive way, and re-identifying my data as mine. (Among other things, the former is a matter of my complicity with objectionable research but raises no privacy issues at all, while the latter eliminates all privacy in my research data vis-a-vis the re-ID researcher.) In acknowledging the possibility that I might be re-identified, I contemplated an inadvertent leak by the PGP (e.g., laptop getting stolen) and hacking by a third-party bad actor (e.g., insurer, employer, bored teenager). It honestly never occurred to me that my blanket consent to have my data used for research would include re-identification research because, frankly, when I enrolled in the PGP a few years ago, I don’t think I knew such a thing as re-identification research existed (no offense!).

    A similar challenge to my intuition that re-ID is different is that lots of research involves making predictions about non-anonymous subjects, and some of these predictions may be sensitive. This was the point of my Facebook example in my post (e.g., predicting from a subject’s FB likes that she is gay), and Madeleine raised similar questions in the comments to my post (e.g., predicting a subject’s history of STD infection from her microbiome). I struggled a lot to respond to this objection in my post. One possible consideration is whether the researcher knows or has reason to know that the subject would not want that particular interrogation of her data done. Madeleine raises a great example:

    In 2009 researchers published that Watson’s ApoE haplotype could be predicted from surrounding DNA, and they explicitly do not describe performing the analysis on his data out of respect for his privacy. It’s unclear to me whether they did or did not perform that analysis privately (they neither admit nor deny).

    Here, we know Watson chose to publicly reveal his entire genome except his ApoE status. A clever — one might say lawyerly — researcher might say, Look, Jim, you made your entire genome sequence available to us to analyze at will, except for ApoE. And hey, we didn’t touch your ApoE. We looked at the stuff near it, stuff you willingly made public. But whether or not this would violate any sort of consent or license, it would strike me as wrong. It’s harder for re-ID researchers to know how PGP participants feel about being associated with their identity since there’s currently no Name field. But we can surmise with a decent level of confidence, I think, that someone who declined to provide their zip code and birthdate (like me) probably doesn’t want to be identified by name. So to the extent that a re-ID researcher targets those folks I think it’s fair to ask about the ethics of a study that seems very likely to be contrary to participants’ preferences, even if it technically can be viewed as falling within the blanket consent they gave.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.