By Alessandro Blasimme and Effy Vayena
Imagine a clinical research protocol to test the efficacy of a nutritional regime on the aging trajectory of the participants. Such a study would need to be highly powered and include thousands of people in order to observe a credible effect size. Participants would remain enrolled in the study for many years, maybe decades. Endpoints would include novel measures of healthy aging such as functioning (the capacity to perform certain activities) and the quality of social life. Participants would thus be asked to provide enormous amounts of personal data covering at the same time their health state, their habits and their social activities – most likely with the help of smart appliances, sensor-equipped wearables, mobile phones and electronic records.
In a different scenario a research team aims to develop clinical protocols for cancer treatment according to the unique genomic signature of their tumor. They will need patients, willing to undergo whole genome germline and tumor sequencing right at the moment of diagnosis and be included in a basket trial. Therapy would then be targeted to the specific genetic alterations of each individual in the hope that a combination of targeted drugs would generate better medical outcomes than the current standard of care.
These two scenarios correspond to the prototypical form of, respectively, precision medicine and precision oncology studies. The first is likely to require large (very large) longitudinal cohorts of extensively characterized individuals – like the All of Us Research Program. The second will require sustained sharing of genomic data, information on patients’ clinical history and response to treatment, and possibly a unique repository in which such information would flow to – something akin the NCI’s Genomic Data Common.
This kind of data-intense research, in particular, introduces game changing features: increased uncertainty about foreseeable data uses, expanded temporal span of research activities due to virtually unlimited data lifecycles, and finally, the relational nature of data. This last feature refers both to the fact that, for instance, zip codes contain other types of sensitive information like information about ethnic background (redundant encoding); and to the fact that data about one person contain information about others– as is the case, for instance, with genetic data among family members. Read More