Re-Identification Is Not the Problem. The Delusion of De-Identification Is. (Re-Identification Symposium)

By Michelle Meyer

This is the second post in Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. We’ll have more contributions throughout the week, and extending at least into early next week. Background on the symposium is here. You can call up all of the symposium contributions by clicking here (or by clicking on the “Re-Identification Symposium” category link at the bottom of any symposium post).

Please note that Bill of Health continues to have problems receiving some comments. If you post a comment to any symposium piece and do not see it within half an hour or so, please email your comment to me at mmeyer @ law.harvard.edu and I will post it. —MM

By Jen Wagner, J.D., Ph.D.

Before I actually discuss my thoughts on the re-identification demonstrations, I think it would be useful to provide a brief background on my perspective.

Identification≠identity

My genome is an identifier. It can be used in lieu of my name, my visible appearance, or my fingerprints to describe me sufficiently for legal purposes (e.g. a “Jane Doe” search or arrest warrant specifying my genomic sequence). Nevertheless, my genome is not me. It is not the gist of who I am –past, present or future. In other words, I do not believe in genetic essentialism.

My genome is not my identity, though it contributes to my identity in varying ways (directly and indirectly; consciously and subconsciously; discretely and continuously). Not every individual defines his/her self the way I do. There are genomophobes who may shape their identity in the absence of their genomic information and even in denial of and/or contradiction to their genomic information. Likewise, there are genomophiles who may shape their identity with considerable emphasis on their genomic information, in the absence of non-genetic information and even in denial of and/or contradiction to their non-genetic information (such as genealogies and origin beliefs).

My genome can tell you probabilistic information about me, such as my superficial appearance, health conditions, and ancestry. But it won’t tell you how my phenotypes have developed over my lifetime or how they may have been altered (e.g. the health benefits I noticed when I became vegetarian, the scar I earned when I was a kid, or the dyes used to hide the grey hairs that seem proportional to time spent on the academic job market). I do not believe in genetic determinism. My genomic data is of little research value without me (i.e. a willing, able, and honest participant), my phenotypic information (e.g. anthropometric data and health status), and my environmental information (e.g. data about my residence, community, life exposures, etc). Quite simply, I make my genomic data valuable.

As a PGP participant, I did not detach my name from the genetic data I uploaded into my profile. In many ways, I feel that the value of my data is maximized and the integrity of my data is better ensured when my data is humanized.

Delusions of De-identification

Linguistically the prefix “re” indicates an action happening again or with backward motion (e.g. repetition, return, restoration, regenerate, reproduce, retrace, redefine). To discuss the potential re-identification of data (genomic or otherwise) and the corresponding risks, we must first think critically about the possibility of de-identification.

The Privacy Rule of the Health Information Portability and Accountability Act (HIPAA) lists 18 identifiers that can be stripped to “de-identify” the data. Genomes cannot be de-identified and are themselves identifiers. The sequences are unique to the individuals from whom they were derived. Yet genomic data can be unlinked, detached from other identifiers, such as an individual’s name, that may make the individual more comfortable participating in the PGP and other research endeavors. Yet, by and large, de-identification is a delusion.

I attended the GET conference this year and visited the Data Privacy Lab’s re-identification table. I was not surprised at all to learn that simply providing my sex, birthdate, and zip code would allow anyone to identify me easily. (As previous blog posts have explained, no PGP participant should be surprised by this.) Again, I have made no attempts to detach identifiers from my PGP profile, so the risk of “re-identification” is a non sequitur. Moreover, the identifying feature of genomes is not inherently negative: identification applications have a number of benefits that are not yet being realized to their fullest potential (e.g. missing persons, human trafficking, family reunifications).

On the Re-identification Demonstrations

If the purpose of the re-identification demonstrations was to highlight the ease of identifying members of the PGP, that purpose could have been fulfilled without specifically attempting to identify members. The PGP consent process, which is thorough and documented, addresses confidentiality and anonymity (or lack thereof) in Article IX. The relevant part to consider for these re-identification demonstrations is Item 9.2 (emphasis added):

9.2     Association of Your Name With Your Data. The PGP will not intentionally associate your name with your genomic or trait data or other information that is published to the PGP’s public website and database or otherwise intentionally identify you as a participant in the PGP without your prior consent. However, as described above, because of the identifiable nature of the information you provide to the PGP, as well as the nature of the data and analyses generated by the PGP, it is possible that one or more third parties may identify you as a participant in the study. This may result in the association of your published data and other information with your name or other information that you have not provided to the PGP and may not have wished to be publicly disclosed.

The demonstrations are unmistakably intentional attempts to associate participants with their data; however, the re-identification demonstration itself did not show names or PGP Participant ID as part of the demonstration results. Moreover, implicit consent was given for the attempt to identify when each visitor to the demonstration table voluntarily entered his/her sex, birthdate, and zip code to conduct the demonstration. Technically, in my opinion, the re-identification demonstrations have not gone beyond the informed consent given by PGP participants and the subsequent implicit consent given by those who participated at the demonstration table.

Rather than focusing on whether it was controversial or unethical for the demonstrations to occur, I would be more interested in exploring the necessity and efficacy of such demonstrations in conveying just how delusional the concept of “de-identification” is in a research context using any type of data. Could broad recognition of the delusion of de-identification facilitate meaningful reform in human research protection policies?

2 thoughts to “Re-Identification Is Not the Problem. The Delusion of De-Identification Is. (Re-Identification Symposium)”

  1. Posted on behalf of Steve Wilson:

    Jen makes a very good point that the prospects of remaining “de-identified” are very low, and we all agree that the demonstrations serve a useful purpose reinforcing just how low. Perhaps the 1000 Genomes project should now review their statement that re-identification is “very hard”.

    However I wouldn’t badge the misunderstanding of de-identification as ‘delusional’. It’s a very technical and non-obvious point that sex, DOB and zip code suffice to identify so many people. There is nothing in item 9.2 as quoted that involves PGP participants giving “implied consent” for re-identification. The text is really a disclaimer, designed to minimise the risk of the project sponsors. It’s akin to the disclaimer a lock manufacturer might make in relation to their product: ‘we cannot guarantee that our lock is 100% secure under all circumstances’. Fair enough, that would be a true statement, but a criminal that does pick the lock is no less culpable for the disclaimer being made.

    I’d like to repeat my core point: attaching a name to an otherwise anonymous record is an act of indirect collection of PII. As such, it may be a breach of relevant privacy principles. This is the ethical crux of the matter: if the re-identification is wrongful in some jurisdictions, how should the researchers doing such demonstrations weigh the good that come from them?

    1. Posted on behalf of Jen Wagner:

      Perhaps my post was not clear. For that I apologize. I wasn’t trying to suggest that Item 9.2 of the PGP Consent document discusses implied consent. Similarly, I wasn’t trying to suggest that all PGP participants agreed to be participants in re-identification demonstrations. I was trying to point out that those who participated at the re-ID demo table at the GET conference and voluntarily entered their information into the touchpad after hearing about the demo gave their implied consent to the demo.

      I would question what is to be gained by saying re-id demos (such as the table at the GET conference) be required to gain IRB approval. While I have never served as a member of an IRB, I would expect re-id demos to be exempt from IRB review under categories 2 and 4 and potentially not be research at all (I don’t think the demos themselves were designed to contribute generalizable knowledge, though perhaps the actions that led up to the demos was). The public nature of PGP data means that category 4 is broader than in many other research contexts. Should we reconsider what research can be done with public data/public records? I would argue no, though perhaps you are on the other side of that policy discussion.

      Article IX of the consent document is not an empty “disclaimer” in my perspective. Like all agreements, it sets forth expectations for the relative parties from the outset. Sure, you can view it as limiting the sponsor’s liability risks but you can also view it as simultaneously defining/establishing the rights of the participants. I know some twitter discussions were focused on this point shortly after the GET conference, and it is an important point to make: assuming the risk that profiles might be re-identified is not the same as giving permission (i.e. providing consent) for someone actively attempting to re-identify profiles. While in theory I agree with that sentiment, I don’t believe genomes are ever de-identified, which leads me to find the conversation about “re-identification” to be out-of-focus. This is particularly the case when we are specifically discussing the re-id demo that occurred at the GET conference, which again did not associate or publish names with profiles or genomes. It merely gave you the result as to whether you were likely located and how that changes based on specificity of info provided in the profile (e.g. date of birth or just year of birth, zip code or just state of residence, etc).

      I recognize that not everyone will see de-identification as being delusional. I intentionally chose strong language for my post. You said your core point is that “attaching a name to an otherwise anonymous record is an act of indirect collection of PII.” I don’t necessarily disagree with you on that as stated (though the re-ID demo at the GET conference did not do that).

      Conceptually, are genomes ever anonymous or are they simply unidentified? I would argue the latter. I think we can get to the same policy points you are trying to make, but I believe it’s an important fundamental distinction. By acknowledging genomes are simply unidentified, we acknowledge the genome belongs to someone (i.e. there is a potential victim). If we believe that a genome sequence (or more accurately just a CODIS profile) is sufficient to identify a criminal suspect, why should we not hold similarly that it is enough to identify a potential victim of civil or criminal wrongdoing? Doing so would allow us to enumerate meaningful policies regarding how people engage/interact with unidentified genomes (regardless of the format in which the genome is encountered – in biospecimen or as data). From a policy standpoint, I would prefer we develop policies centered on the person and relying on notions of contextual integrity rather than policies centered on the data. I haven’t been convinced that a blanket, universal approach to privacy is preferable to a contextualized approach.

      Finally, while the right of publicity has been neglected in policy discussions, I personally think it is as apt as rights of privacy when we’re considering unidentified genomes (or de-identified or anonymous genomes if you prefer).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.