By Mason Marks
In this brief essay, I describe a new type of medical information that is not protected by existing privacy laws. I call it Emergent Medical Data (EMD) because at first glance, it has no relationship to your health. Companies can derive EMD from your seemingly benign Facebook posts, a list of videos you watched on YouTube, a credit card purchase, or the contents of your e-mail. A person reading the raw data would be unaware that it conveys any health information. Machine learning algorithms must first massage the data before its health-related properties emerge.
Unlike medical information obtained by healthcare providers, which is protected by the Health Information Portability and Accountability Act (HIPAA), EMD receives little to no legal protection. A common rationale for maintaining health data privacy is that it promotes full transparency between patients and physicians. HIPAA assures patients that the sensitive conversations they have with their doctors will remain confidential. The penalties for breaching confidentiality can be steep. In 2016, the Department of Health and Human Services recorded over $20 million in fines resulting from HIPAA violations. When companies mine for EMD, they are not bound by HIPAA or subject to these penalties.
A well-known example of corporate mining of EMD involves the Target Corporation. In 2012, the New York Times reported that Target hired statisticians to find patterns in its customers’ purchasing habits. They discovered that pregnant female shoppers stocked up on unscented body lotion around the start of their second trimester. After making this connection, Target could reach out to expectant mothers with coupons and advertisements before its competitors learned the customers were pregnant. Before Target analyzed the data, buying unscented lotion had no perceived connection to a customer’s health.
Corporate analysis of consumer behavior is not new. However, the recent explosion of information technology has made mining for EMD much easier. Thanks to big data and machine learning algorithms, online platforms like Facebook and Google are adept at analyzing user behavior. The results can be unnerving. At one time or another, most people have experienced the uncanny feeling that online platforms are reading their minds. Have you ever made a purchase and quickly found ads for related products populating your Facebook or Twitter feed? Machine learning can draw inferences about users’ tastes that are of great value to advertisers. It allows platforms to engage in targeted advertising on a massive scale; they can sell ads that are more likely to reach an interested audience.
Many people don’t see the harm in this practice. After all, why wouldn’t you want to see ads that interest you? Targeted ads can inform users of products and services that are tailored to their values and preferences. But something more pernicious is happening. Using the same technology that brings you personalized advertising, platforms can piece together disparate scraps of data, which would not ordinarily be considered health information, to create a detailed picture of your physical and mental health. Data gleaned from Tweets, Facebook “likes,” Target and Amazon.com purchases, Instagram Posts, and Uber rides can be fitted together like the pieces of an elaborate jigsaw puzzle. As new pieces are added, the picture comes into focus. However, unlike a jigsaw puzzle of the Eiffel Tower, the Mona Lisa, or a basket of puppies, the picture that emerges from this puzzle is a page from your medical record.
According to Michigan State University psychiatrist Scott Monteith and his collaborator Tasha Glenn, companies can combine pieces of seemingly innocuous user data to create sensitive personal health information that is not protected by HIPAA. By leveraging big data and machine learning, platforms can circumvent privacy laws and obtain personal medical information that users would willingly disclose only within the confines of a doctor’s office. A recent study involving Facebook users demonstrates this power. The authors used non-medical online behavior to identify individuals with substance use disorders. They discovered that swear words, sexual words, and words related to biological processes were positively associated with drug, alcohol, and tobacco use. Other types of words helped the authors distinguish between different forms of substance use. For example, “space reference words such as ‘up’ and ‘down’ are positively correlated with alcohol use, while words related to anger such as ‘hate’ and ‘kill’. . . are positively correlated with drug use.”
Facebook users writing these words would have no idea that they convey meaningful information about their substance use histories, and human observers reading them would not know they carry information about the poster’s health status. Even doctors and behavioral scientists would be unaware of their connections to health. But machine learning algorithms can transform seemingly benign data from online browsing habits, social media posts, and e-mails into sensitive details about users’ health. This phenomenon needs a name, and I propose that we call it mining for Emergent Medical Data (EMD).
Mining for EMD is different from gathering medical information that users willingly disclose. Imagine a man named Tom who visits an online chat room for people with post-traumatic stress disorder (PTSD). Tom enjoys interacting with people who have problems similar to his own. He finds the chat room a comforting addition to his PTSD treatment strategy. He frequently posts about his symptoms and how they affect his personal and professional life. Tom knows that the chatroom is open to the public and anyone with the time and interest can log-on and view his posts. He is also aware that his posts, and even his mere presence in the chatroom, could lead others to believe that he suffers from PTSD.
What Tom may not know is that a pharmaceutical company, which owns and operates the chatroom, could gather data from its visitors for use in marketing research. The company may even sell the data to third party data brokers. Though this practice could upset Tom if he knew about it, his presence in the chatroom and the contents of his posts are not EMD because their health significance would be readily apparent to human observers. When the chat room provider collects, analyzes, and sells Tom’s information, it is not mining for EMD. Mining for EMD involves the collection of data with no apparent connection to health, analyzing or transforming the data using sophisticated tools such as machine learning, and harvesting the resulting nuggets of personal information that emerge.
Users of health-related websites and mobile health apps often volunteer health-related data. Smart phones and fitness trackers use apps that record a user’s heart rate, diet, sleep patterns, and activity levels (such as daily steps taken). Like Tom’s presence in the chat room, these data are not EMD as they bear some discernible relationship to health status even though the connection may be indirect. Decreased sleep or activity levels could reflect a change in one’s health, but they are not direct evidence of illness or disease. Therefore, these data are different from traditional health information. They are less specific. At the same time, they are not EMD because they maintain some connection to health that is readily apparent without requiring sophisticated data analysis. Some scholars have categorized this type of information as quasi-health data. Though the use of quasi-health data may be worthy of greater scrutiny and protection, the use of EMD is more troubling because it can more easily occur without the knowledge of consumers.
Why should we care about the use of EMD? First, platforms can exploit loopholes in privacy laws, and consumers’ lack of knowledge regarding EMD, to gain access to medical information that would typically be off limits. People probably would not willingly disclose the health information conveyed by EMD unless they were assured it would remain confidential. Second, online platforms can sell EMD to third parties who might use it as the basis for discrimination in decisions related to employment, lending, higher education, and insurance. Third, if people learn about the collection and use of EMD, they may modify their online behavior, which could undermine the free exchange of ideas that undergirds our democratic culture. Finally, by using EMD to lump users into categories based on health status, platforms are acting as medical diagnosticians. It is unclear whether they should have this ability, which we typically reserve for licensed healthcare providers.
To be sure, there are potential benefits to using EMD. Platforms could use it to track the spread of infectious disease, monitor drug abuse, and identify people at risk of committing suicide or homicide. These applications could improve public health. Yet even potentially beneficial uses for EMD come with caveats. For example, what happens when platforms mistakenly label users as suicidal or homicidal? These issues necessitate a public discussion of the boundaries for responsible use of EMD. The first step is to make people aware that EMD can be collected and used in harmful ways. With this knowledge, society can choose to continue permitting online platforms to use this type of information. Alternatively, it can do something to curtail it.
Lawmakers in the European Union are taking great strides to protect individual privacy. The General Data Protection Regulation (GDPR) will increase the rights of EU consumers to control how their data is used and impose hefty fines on businesses that fail to comply. Specifically, Article 22 of the GDPR gives people “the right to not be subject to a decision based solely on automated processing, including profiling . . . .” Article 22 could help shield consumers from the effects of profiling based on EMD. Critics argue that the language of Article 22 is vague and offers inadequate protection. However, because the GDPR does not go into effect until 2018, it may be too early to tell how it will affect the use of EMD. Regardless, there are no comparable laws in the United States.
This September, Democratic Senator Edward Markey introduced the Data Broker Accountability and Transparency Act (DBAT Act). If passed, it will allow consumers to access and correct the personal information that data brokers store and use. Section 4(e) will allow people to opt-out of having their personal information used for marketing purposes. However, the DBAT Act lacks protection against other uses of personal data such as reliance on EMD in hiring, lending, and insurance decisions. Though Article 22 of the GDPR may provide imperfect protection against the use of EMD, it is a step in the right direction. Elements of Article 22 could be incorporated into the DBAT Act to bolster its protection of US consumers against EMD based profiling.