This is the first post in Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. We’ll have more contributions throughout the week. Background on the symposium is here. You can call up all of the symposium contributions by clicking here (or by clicking on the “Re-Identification Symposium” category link at the bottom of any symposium post). —MM
I’m fascinated by the methodological intersections of technology and privacy – or rather the lack of intersection, for it appears that a great deal of technology development occurs in blissful ignorance of information privacy norms. By “norms” in the main I mean the widely legislated OECD Data Protection Principles (see Graham Greenleaf, Global data privacy laws: 89 countries, and accelerating, Privacy Laws & Business International Report, Issue 115, Special Supplement, February 2012).
Standard data protection and information privacy regulations world-wide are grounded by a reasonably common set of principles; these include, amongst other things, that personal information should not be collected if it is not needed for a core business function, and that personal information collected for one purpose should not be re-used for unrelated purposes without consent. These sorts of privacy formulations tend to be technology neutral; they don’t much care about the methods of collection but focus instead on the obligations of data custodians regardless of how personal information has come to be in their systems. That is, it does not matter if you collect personal information from the public domain, or from a third party, or if you synthesise it from other data sources, you are generally accountable under the Collection Limitation and Use Limitation principles in the same way as if you collect that personal information directly from the individuals concerned.
I am aware of two distinct re-identification demonstrations that have raised awareness of the issues recently. In the first, Yaniv Erlich used what I understand are new statistical techniques to re-identify a number of subjects that had donated genetic material anonymously to the 1000 Genomes project. He did this by correlating genes in the published anonymous samples with genes in named samples available from genealogical databases. The 1000 Genomes consent form reassured participants that re-identification would be “very hard”. In the second notable demo, Latanya Sweeney re-identified volunteers in the Personal Genome Project using her previously published method of using a few demographic values (such as date or birth, sex and postal code) extracted from the otherwise anonymous records.
A great deal of the debate around these cases has focused on the consent forms and the research subjects’ expectations of anonymity. These are important matters for sure, yet for me the ethical issue in re-anonymisation demonstrations is more about the obligations of third parties doing the identification who had nothing to do with the original informed consent arrangements. The act of recording a person’s name against erstwhile anonymous data represents a collection of personal information. The implications for genomic data re-identification are clear.