By Sara Gerke and Chloe Reichel
Recently, Google announced a new direct-to-consumer (DTC) health app powered by artificial intelligence (AI) to diagnose skin conditions.
The company met criticism for the app, because the AI was primarily trained on images from people with darker white skin, light brown skin, and fair skin. This means the app may end up over-or under-diagnosing conditions for people with darker skin tones.
This prompts the questions: How can we mitigate biases in AI-based health care? And how can we ensure that AI improves health care, rather than augmenting existing health disparities?
Bias is certainly one of the major issues raised by AI in health care. Stakeholders, especially AI developers, need to make sure to mitigate biases as best as possible. Obermeyer et al. have recently published an algorithmic bias playbook that can serve as a helpful guide for stakeholders on how to define, assess, and mitigate bias. I also think that we need to admit that likely no dataset is free of bias. For example, even if we gather all breast cancer-relevant data nationwide, this does not necessarily mean that this dataset is perfect. We lack the data of those who do not seek medical care (e.g., for economic reasons). It will be important that stakeholders understand these issues to harness the potential of AI, rather than augmenting existing health disparities.
First, we must ensure that the data used to create our algorithms reflects the diversity in the populations that will be using or benefiting from the algorithm. Second, we need to create data science and engineering teams that have diversity in age, ethnicity, race, and gender. Diverse data science teams will be more cognizant of the bias in their algorithms, and diverse teams can actively work to mitigate that bias. Third, we need to be transparent about what data is used, who trained the models, and who determined ground truth. If we only follow these three steps, we can go a long way to mitigate bias.
Bias exists even within AI. One needs to understand the application to which AI will be used and develop the appropriate training set to build the algorithm. There are multiple examples of AI tools that end up with bias, or fail in the real world because their training sets are based on measured data in lab settings with specific individuals- when applied to the general population, these algorithms will naturally fail.