This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. You can call up all of the symposium contributions here. We’ll continue to post contributions throughout the week. —MM
Dear Misha:
In your open letter to me, you write:
No one is asking you to be silent, blasé or happy about being cloned (your clone, however, tells me she is “totally psyched”).
First things first: I have an ever-growing list of things I wish I had done differently in life, so let me know when my clone has learned how to read, and I’ll send it on over; perhaps her path in life will be sufficiently similar to mine that she’ll benefit from at least a few items on the list.
Moving on to substance, here’s the thing: some people did say that PGP participants have no right to complain about being re-identified (and, by logical extension, about any of the other risks we assumed, including the risk of being cloned). It was my intention, in that post, to articulate and respond to three arguments that I’ve encountered, each of which suggests that re-identification demonstrations raise few or no ethical issues, at least in certain cases. To review, those arguments are:
- Participants who are warned by data holders of the risk of re-identification thereby consent to be re-identified by third parties.
- Participants who agree to provide data in an open access format for anyone to do with it whatever they like thereby gave blanket consent that necessarily included consent to using their data (combined with other data) to re-identify them.
- Re-identification is benign in the hands of scholars, as opposed to commercial or criminal actors.
I feel confident in rejecting the first and third arguments. (As you’ll see from the comments I left on your post, however, I struggled, and continue to struggle, with how to respond to the second argument; Madeleine also has some great thoughts.) Note, however, two things. First, none of my responses to these arguments was meant to suggest that I or anyone else had been “sold a bill of goods” by the PGP. I’m sorry that I must have written my post in such a way that it leant itself to that interpretation. All I intended to say was that, in acknowledging the PGP’s warning that re-identification by third parties is possible, participants did not give third parties permission to re-identify them. I was addressing the relationship between re-identification researchers and data providers more than that between data providers and data holders.
Second, even as to re-identification researchers, it doesn’t follow from my rejection of these three arguments that re-identification demonstrations are necessarily unethical, even when conducted without participant consent. Exploring that question is the aim, in part, of my next post. What I tried to do in the first post was clear some brush and push back against the idea that under the PGP model — a model that I think we both would like to see expand — participants have given permission to be re-identified, “end of [ethical] story.”
Risk vs. Certainty
You write:
Your deep–and maybe sui generis–understanding of the history of de-identification demonstrations makes me wonder how you could have been shocked or even surprised by the findings of the Sweeney PGP paper. And yet you were.
As I wrote in my post, I wasn’t surprised that participants could be re-identified — especially from zip codes, sex, and date of birth. Indeed, I didn’t include my own zip code or birth date precisely for that reason. My surprise, instead, focused not on the fact that PGP participants could be re-identified, but (1) that they were, and (2) that some responded by insisting, in one way or another, that we were asking for it. I’ve already addressed the latter surprise, so now let me say something about the former.
Although it is certain that re-identification is possible, whether participants will be re-identified is a matter of probability. Estimating the probability of re-identification, it seems to me, is quite complex, and will usually vary depending on both the algorithm and the individual profile. I expect that Dan Barth-Jones will discuss the probability of re-identification from a statistician’s point of view in his symposium contribution. Here (as best as I can recall from when I enrolled in the PGP) is how my own, much less scientific, thinking went: I specifically contemplated both an inadvertent leak by the PGP (e.g., a laptop with names of participants getting stolen) and hacking by a third-party bad actor (e.g., insurer, employer, bored teenager). The likelihood of either event, I thought then and continue to think now, is relatively (but not trivially) low, requiring either an act of carelessness on the part of the data holder or incentives and resources on the part of a data intruder.
What honestly never occurred to me was the apparently much greater likelihood of contrived re-identification by academic researchers. And this raises a sort of paradox about the risk of re-identification and re-identification demonstrations: If we count attacks by academic demonstrators alongside those of real-world data intruders in estimating the probability of re-identification, then demonstrations artificially increase the probability of being re-identified by a true intruder. I worry about the effects of the public’s already-low willingness to participate in research under those circumstances. On the other hand, if we don’t count demonstrations, then participants like me will apparently underestimate the probability of being re-identified.
In the wake of the PGP re-identification demonstrations, the PGP has made a variety of suggestions along the lines that participants should be willing to be identified or quit the Project. Just as my reaction to the PGP re-identification demonstrations has made you sad, the idea that I don’t truly understand the PGP ethos, and that I, and other participants who have reacted similarly to the re-identification demonstrations, should quit, has made me sad. More importantly, I worry that some of these suggestions would undermine some of what makes the PGP so special.
One approach is to encourage participants “to treat your PGP profile as if your name were already on it.” I worry that this is unrealistic as a way of debiasing participants who make be irrationally optimistic about their odds of being re-identified (if, indeed, such debiasing is its purpose). The fact is that there is not a 100% probability that a particular participant will be re-identified. You can ask participants to pretend as if the probability were 100%, but I think it’s hard for human beings to convince themselves of things they know to be untrue.
Consider how IRBs think about the benefits that are listed in the informed consent form. In order to help prospective participants avoid the therapeutic misconception, in which they believe, at some level, that participating in a research study designed to produce generalizable knowledge will yield direct medical benefits for them, IRBs often understandably insist that the informed consent form state that participants are not expected to benefit at all. (The PGP consent document itself makes such a statement.) But imagine that the study in question is backed by Merck to the tune of millions of dollars. Most prospective participants will believe — correctly — that there is a non-zero chance that they will medically benefit, even in an early phase trial. And yet they are extremely unlikely to be able to reasonably predict what that non-zero probability of benefit is. By refusing to provide that information to prospective participants and pretending that the probability is zero, IRBs invite prospective participants who are wholly unequipped to do so to invent their own odds. (NB: someone else originally made this argument and even, I believe, produced some data supporting it. I’d love to give credit where it’s due but am having a senior moment and cannot recall the author(s). If this rings a bell for anyone, let me know and I’ll update the post.)
I fear that a similar effect would occur here. Perhaps the rest of this symposium will convince me otherwise, but as of now, I’m skeptical that a warning in a consent document would convince me that the chances of my being re-identified, even in a genetics study, are 100% — and certainly not in the near future. So far, as I understand it, re-identification of genetic information requires a reference dataset. We’re a ways away from the genome being per se identifiable without such references. So I’ll discount your inflated 100% figure. And the danger is that I’ll do so too much, relative to the actual risk, whatever that is.
Another alternative that’s been floated — that participants be required to actually self-identify in order to participate in the PGP — implicitly recognizes the difficulty of asking participants to pretend as if they were already identified. However, as I expect the Project realized when it decided not to require self-identification after the PGP-10, with this requirement in place, even the PGP could meet its target of 100,000 participants (a key threshold for meaningful genetics research on complex traits with small effect sizes), its data would almost certainly be (even more) biased toward people who are wealthy, of retirement age, and/or who have boring genomes and traits. The suggestion that participants remove sensitive traits from their profiles if they don’t want to be re-identified would similarly undermine the scientific potential of the PGP. Some of the most sensitive traits are the ones in most need of the attention of researchers.
Genomic Extroverts and Genomic Altruists in the Garden
As you know, I’m all about crafting rules that accommodate heterogeneous preferences and circumstances, wherever possible. (And thanks for the shout out to my forthcoming law review article — again; I feel like I should be donating 10% of my SSRN downloads to you.) Although members of the PGP community tend to share a first-adopter mentality, a love of science (and other traits), there is heterogeneity in preferences even within our ranks. Some PGPers seem to be genomic extroverts. For a variety reasons, they either want to be identified with their data, or truly could not care less if they are. For them, there is no downside to being re-identified (and there may be an upside). Others are genomic altruists. They aren’t genomic introverts, or they wouldn’t, one hopes after passing the PGP’s unique exam, be in the Project, but they don’t especially want to be identified with their data. Instead, they’re willing to assume the risk of re-identification (and other risks) in pursuit of the Project’s goals. As you know, I aspire to become a genomic extrovert like you, but for now, I’m afraid I’m just a genomic altruist.
You asked: “Knowing what you know and having done your own personal cost-benefit analysis, why not quit the PGP? Why incur the risk?” Because, as I mentioned above, I didn’t deem the probability of re-identification to be high (especially since I haven’t included my zipcode or birthdate in my profile) — and because, in any event, I don’t view the magnitude of the harm, should it occur, as large. It can be perfectly rational not only for genomic extroverts but also for genomic altruists to assume the risk of re-identification and enroll in a project like the PGP. If I were a genomic introvert — if re-identification would be catastrophic for me — I wouldn’t have enrolled in the Project. But I doubt that it would be catastrophic. When, for a few hours, I wasn’t sure whether I had been re-identified or not, I wasn’t besot with panic. If it happens, it happens. In fact, I told the journalist (and Harvard fellow) who wrote the Forbes.com story (and who is writing a book on the business of personal data) that he was welcome to try to re-identify me by whatever means he wished. And had Latanya’s team (or Yaniv’s team, had I had a Y chromosome) asked if they could try to re-identify me, I’d have said yes.
So if the prospect of being re-identified isn’t terrifying to me, why am I voicing any qualms about re-identification demonstrations at all? Because choosing to share personal information when asked is different than having that information taken from you without your permission or even knowledge. And because thinking through best practices for conducting these demonstrations is about much more than my individual preferences about being re-identified, and even than the PGP.
I understand the PGP as having a big enough tent to include extroverts like you and altruists like me. I hope that I’m right about that, and that as the PGP continues to evolve, it remains inclusive. If it is to remain welcoming of altruists, the extroverts who pioneered this groundbreaking project may have to try to see things from the perspective of the altruists when they respond to things like re-identification demonstrations. It’s one thing to not promise the altruists a walled garden impervious to outsiders; it’s another to welcome the outsiders over to stomp on the tulips.
In my next post, I’ll turn to some preliminary ideas about best practices for both data holders and re-identification demonstrators.
Thank you, Misha, as always, for being a constructive interlocutor, for fostering the Project’s collaborative spirit, and for being willing to hear my thoughts in that light.
With affection,
Michelle