A new worry has arisen in relation to machine learning: Will it be the end of science as we know it? The quick answer is, no, it will not. And here is why.
Let’s start by recapping what the problem seems to be. Using machine learning, we are increasingly more able to make better predictions than we can by using the tools of traditional scientific method, so to speak. However, these predictions do not come with causal explanation. In fact, the more complex the algorithms become—as we move deeper into deep neural networks—the better are the predictions and the worse are the explicability. And thus “if prediction is […] the primary goal of science” as some argue, then the pillar of scientific method—understanding of phenomena—becomes superfluous and machine learning seems to be a better tool for science than scientific method.
But is this really the case? This argument makes two assumptions: (1) The primary goal of science is prediction and once a system is able to make accurate predictions, the goal of science is achieved; and (2) machine learning conflicts with and replaces the scientific method. I argue that neither of these assumptions hold. The primary goal of science is more than just prediction—it certainly includes explanation of how things work. And moreover, machine learning in a way makes use of and complements the scientific method, not conflicts with it.
Here is an example to explain what I mean. Prediction through machine learning is used extensively in healthcare. Algorithms are developed to predict hospital readmissions at the time of discharge or to predict when a patient’s condition will take a turn for worse. This is fantastic because these are certainly valuable pieces of information and it has been immensely difficult to make accurate predictions in these areas. In that sense, machine learning methodology indeed surpasses the traditional scientific method in predicting these outcomes. However, this is neither the whole story nor the end of the story. In fact, this is when the “traditional science” starts. By itself, information regarding readmission rates or patient deterioration are not very useful for the healthcare personnel. A physician “armed” (!) with this knowledge can do nothing apart from acting like an oracle. What the healthcare personnel needs to know is the plausible causes of such results. Why would the patient be soon re-admitted? Why should patient’s condition worsen? Why, so that they can take action to break this causal link. And that means what they need to know is exactly this: causation, or in other words, understanding of a phenomenon. It will be the job of health sciences to figure out this missing causal link. That is because the goal of science is not limited to prediction. It certainly includes explanation.
In a nutshell, the scientific method works in the following way: Scientist observes a phenomenon. She formulates a hypothesis that might explain this phenomenon. She then develops testable predictions and uses relevant data to test her hypothesis. If the predictions do not hold, then the hypothesis is falsified and it must be revised or discarded. This continues until the predictions hold and more general theories can be derived from this piece of information.
Realize how this cycle of scientific method starts. It starts with an observation. While machine learning also takes its starting point from an observation, its path then diverges from the traditional scientific method. A data scientist makes an observation of a phenomenon where more accurate prediction is needed. However, instead of formulating a traditional hypothesis with a causal inference, she then creates a model to jump to the prediction phase, testing and modifying her model until it predicts with a high level of accuracy. When the model works, machine learning provides scientists with a great range of new “observations.” Through machine learning, scientists can become aware of phenomena that were previously invisible to them. Going back to our earlier example, an algorithm that accurately predicts patients who are soon to be re-admitted in fact lays out a new observation to be explained. Some factor(s) about these patients make(s) them fall into the same category of those who are highly probable to be soon re-admitted. And once this observation is made, the traditional scientific method starts. Now it is time for health scientists to formulate their hypotheses and look for the best explanation. Only by moving from correlation to causation and prediction to explanation it is possible to have a good range of actionable information.
I would even go further and argue that machine learning could help make the traditional scientific method more ethical. It is reasonable to assume that observations are only as objective as the scientists who make them. A scientist who views women as “whiny” might not pick up on signs regarding the wrong dosage of medication in women while through machine learning, an “observation” regarding dosage, health-results, and gender might make it harder to miss the role of gender and open up more doors for further investigation, helping scientists find the causes of poor treatment results in women. This is in no way arguing that algorithms are unbiased or machine learning is the solution to discrimination in science. That is already old news—our biases creep into our algorithms. But it is also clear that although unchecked algorithms might end up amplifying our biases, it is also understood that oftentimes algorithms are less biased than their human counterparts and they can help humans overcome some of their biases. In this case, while of course care has to be taken to ensure that prejudice is not reinforced, it must also be noted that data scientists open up a whole world of observations for other scientists to work on using the methods and knowledge of their scientific fields.*
* Thanks to Fuat Beser, Jeshua Bratman, Ayyuce Kizrak, and Laura Haaber Ihle for their feedback.