This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. We’ll have more contributions throughout the week, and extending at least into early next week. Background on the symposium is here. You can call up all of the symposium contributions here (or by clicking on the “Re-Identification Symposium” category link at the bottom of any symposium post).
Please note that Bill of Health continues to have problems receiving some comments. If you post a comment to any symposium piece and do not see it within half an hour or so, please email your comment to me at mmeyer @ law.harvard.edu and I will post it. —MM
By Yaniv Erlich
1. Increase the general knowledge –Like any other scientific discipline, privacy research strives to increase our knowledge about the world. You are breaking bad if your actions are aimed to reveal intimate details of people, or worst to exploit these details for your own benefit. This is not science. This is just ugly behavior. Ethical privacy research aims to deduce technical commonalities about vulnerabilities in systems not about the individuals in these systems. This should be your internal compass.
This rule immediately asserts that your published findings should communicate only relevant information to deduce general rules. Any shocking/juicy/intimate detail that was revealed during your study is not relevant and should not be included in your publication.
Some people might gently (or aggressively) suggest that you should not publish your findings at all. Do not get too nervous by that. Simply remind them that the ethical ground of your actions is increasing the general knowledge. Therefore, communicating your algorithms, hacks, and recipes is an ethical obligation and without that your actions cannot be truly regarded as research. “There is no ignorabimus … whatever in natural science. We must know — we will know!”, the great Mathematician David Hilbert once said. His statement applies also to privacy research.
Do no harm to the individuals in your study. If you can prove your point by a simulation on artificial data – do it. If you need to show a case on a real system – try to build a model in your own lab. If you need to re-identify real people – construct model cases around people that voluntarily revealed their identity prior to the study.
The trickiest cases are those involving the identification of anonymous individuals. These cases are equivalent to a blind experiment in a ‘traditional’ scientific study and usually provide the ultimate proof for your algorithm (see here for the importance of blind experiments). The best-case scenario is to find individuals that opt-in for privacy studies. If such cases are impossible to find, my perspective is that the minimal inclusion criteria are (a) publicly available datasets with (b) individuals who were consented that privacy breaches might occur and that (c) gave permission for unrestricted secondary use of their data and (d) risk for potential harm is minimized by selecting cases whose re-identification is not likely to have any consequence. For instance, datasets derived from deceased subjects is one of the best options.
3. If it is broken, try to fix it – Congratulations! Your re-identification experiments worked! Now what? First, document everything – every step you have taken during your analysis. Internet websites and resources can come and go. Make sure to print the relevant information and store it in a secure place.
Try to think about a mitigation scheme and consult with other researchers that you can trust (see next Rule – “don’t be smart”). Once you have reached a conclusion (which can be that there is no good fix), this is a good time to submit the paper for a review.
You must also inform the owner of the resource about the glitch. I recommend having this conversation after the paper was accepted but before the publication for several reasons: (a) at this point, your worked has matured through peer-review and is less likely to contain a major flaw (b) the owner might be more convinced about your findings knowing that it has been reviewed by other researchers (c) contacting the owner when the paper is still under review might change the setting of the resource, render your experiments obsolete, which can confuse your reviewers (d) with a publication deadline in hand, the owner is more likely to act swiftly in mitigating the risk.
If possible, prepare a demo for the resource owner because this can be the most useful way to communicate your findings and to elucidate the risk. Inform the resource owner about your publication plan. Make sure that the owner has sufficient time to respond to the threat, contact people that might be affected, and to simply digest your findings.
4. Don’t be over-confident – Privacy research combines technical, social, and ethical aspects. You cannot be an expert in all these domains. We all have our blind spots even in our respective areas of expertise. Consult with as many people as possible about the different aspects of your research. They will illuminate flaws in your analysis. Some re-identification experiments do not need IRB approval since they are (ironically) considered as exempt human subject research due to the use of public resources. However, I suggest you consult with your IRB. It is always good to get feedback and review of your work. Don’t be be over-confident. Get help from other people.
5. Don’t be afraid. White-hat hacking has been instrumental for the development of security and cryptographic systems. Banks and monetary services pay companies good money to constantly perform penetration tests of their systems. Don’t be afraid. Go strong.
Thanks to Dina Zielinski and Melissa Gymrek for useful comments.