By Jorge L. Contreras
Since genealogy websites first went online, researchers have been using the data that they contain in large-scale epidemiological and population health studies. In many cases, data is collected using automated tools and analyzed using sophisticated algorithms.
These techniques have supported a growing number of discoveries and scientific papers. For example, researchers have used this data to identify genetic markers for Alzheimer’s Disease, to trace an inherited cancer syndrome back to a single German couple born in the 1700s, and to gain a better understanding of longevity and family dispersion. In the last of these studies, researchers analyzed family trees from 86 million individual genealogy website profiles.
Despite the scientific value of publicly-available genealogy website information, and its free accessibility via the Internet, it is not always the case that this data can be used for research without the permission of the site operator or the individual data subjects.
Though online TOUs can seem to be nothing more than routine annoyances, they have been found to be binding contracts by more than a few courts in the U.S. and elsewhere. Violating such contractual terms could give rise to various legal remedies, including monetary damages and orders to cease using data obtained without permission.
In order to understand the degree to which website TOUs limit research use of public genealogy data, a group of collaborators and I analyzed the TOUs of seventeen leading genealogy websites.
Of the seventeen websites, thirteen of them contained restrictions that effectively prohibited the use of data for scientific research purposes — whether through limiting use to genealogical purposes only, prohibiting “commercial” uses, or prohibiting technically necessary steps such as downloading or automatically scraping, crawling or harvesting data.
Genealogy — studying our ancestry and family histories — has become one of America’s favorite pastimes. There are now thousands of web sites that provide tips and information to amateur genealogists, link to public records, offer forums for conversation, and allow users to upload photos, family trees, and other information (many of these are cataloged at Cyndi’s List).
In recent years, sites like GEDmatch, AncestryDNA and FamilyTreeDNA have begun to allow the uploading and sharing of genetic information (just data, no actual biospecimens). This information allows users to match DNA data to locate and verify long-lost relatives and, in some cases, siblings, parents and offspring. When someone who was allegedly located in Germany contacted me claiming that she was my father’s hitherto unknown half-sister, the first thing I did was suggest that she submit a DNA sample to Ancestry so that we could see whether we were DNA matches (we were!).
But online genealogy sites are not just about family reunions and finding out whether your ancestors really came over on the Mayflower.
In 2018, investigators revealed that they had identified and arrested the infamous “Golden State Killer” by comparing crime scene DNA to the data contained in the public GEDmatch database. Numerous other “cold cases” have been solved in a similar manner. While these arrests have generally been applauded, they also raise questions of individual privacy — particularly with respect to data contained accessible from public genealogy websites.
While we are not aware of any lawsuits that have been brought against scientific researchers using public genealogy data without permission, they would be well-advised to proceed with caution.
Based on our findings, we recommend that genealogy site operators consider granting researchers permission to use publicly-available data for legitimate scientific research purposes, even if they wish to prohibit more controversial data uses such as law enforcement, surveillance, racial profiling, insurance underwriting, and direct marketing.