Is sharing always caring? On open genomic data sharing and why people do it.

Special thanks to Tobias Haeusermann1, (postdoctoral researcher at University of Zürich), and Bastian Greshake2 (co-founder of for collaborating to write this guest blog post.

In times of political turmoil, we tend to see discussions about the responsibility of science in academic circles. But unfortunately, something is rotten in the state of academia too. While the academic pursuit should, first and foremost, entail the cultural accumulation of knowledge and its transmission across generations and borders, the structures and strictures of science often tend to hinder rather than foster the sharing of knowledge. In their recent book “A Passion for Society: How We Think about Human Suffering”, sociologist Iain Wilkinson and medical anthropologist Arthur Kleinman openly address academia’s centuries-old dirty little secret: the barriers the ‘ideal’ dispassionate researchers erect around themselves is frequently selfish and self-serving. Oftentimes, Wilkinson and Kleinman write “what now passes as social science is in thrall to technocratic procedures and structures of career that leave it critically sterile, cynical and devoid of passion” (p. xi). They conclude that now might be the time to renegotiate the terms once again. As medical researcher John Tregoning lamented in his NATURE column not long ago, “researchers reap more rewards for publishing flashy papers than for doing solid work, and the two do not always align. Everyone ends up chasing trends and asking the same questions. Broad, multidisciplinary research might achieve more in terms of advancing science, but it is harder to publish and finance. We end up sticking to the narrow path towards prestigious papers and big grants at the expense of worthier endeavours”. Neil Hall, from the Centre for Genomic Research at the University of Liverpool has even proposed a ‘Kardashian index’, to measure of discrepant social media profile for scientists, or in other words, a way to expose academics who are famous just for being famous.

Data is becoming the world’s most valuable resource…

Along with the corporatization of higher education, data is also rapidly becoming the world’s most valuable economic resource. Over the past decade, IT companies have become the custodians of crucial technologies. They are ever more operating as gatekeepers, and as a result dominate critical data junctions. This even led the reliably business-friendly ‘Economist’ to write in one of their recent issues, that if governments don’t want a data economy dominated by a few giants, they will need to act soon“. All in all, it seems that the reluctance of academic institutions, scientists, and companies to share their knowledge openly and for free is not helping to stop the trend toward knowledge inequality.

In genomics, one example stems from data that is emerging from direct-to-consumer genetic testing (DTC-GT). Companies such as 23andMe, FamilyTreeDNA, and now provide customers access to their genetic data for a comparatively affordable fee. And the industry is flourishing. Yet, genetic tests lead to crucial data and with that to new forms of power. Indeed, the proliferation of DTC testing raises pressing questions about whether commercial firms are gaining access to health data without the necessary accountability.

An open data sharing approach to genomics

Against this background, an open data sharing approach to genomics feels like a breath of fresh air. Enter OpenSNP. Rather than allowing for companies to hoard the spoils of DTC-GT, some scientists and consumers have taken matters into their own hands. Initiated in 2011, the platform openSNP allows individuals to contribute diverse sets of DTC-GT results, along with phenotypic annotations about themselves. Specifically, users can share their results from micro-array based genotyping, which makes up the vast majority of all data sets. In addition, users can upload VCF (variant call format) files, which may include exomes and full genomes. Genomic and phenotypic data are subsequently openly available to anyone, without any limits or restrictions on the use of the data.

In spite of the platform’s radically open nature, the project has attracted more than 5,000 registered users to date, relying purely on social media and word of mouth for recruiting. Since September 2011, users have uploaded over 3,000 data sets and interest in using them for scientific studies and commentary continues grow.

Why people share

In our study, published this week in PLOS ONE, we set out to discover more about the motivations of users who had decided to share their genomic data on openSNP. It is the first attempt to describe open genomic data sharing activities that takes place without institutional oversight. Unsurprisingly, the geographical distribution of the respondents showed the USA as dominant. There was no significant gender divide, the age distribution was broad and there was no marked gender divide. Educational background was varied, with the median distribution towards a slightly more highly educated population. These characteristics differ from other research, which suggests that, as a rule, individuals purchasing DTC genetic and genomic test are highly educated, middle aged users

Above all, however, we found that health, even though prominent, was not the users primary or only motivation to be tested. Rather, it was ancestry that was most commonly mentioned. Regarding their motivations to openly share their data on openSNP, 86.05% indicated wanting to learn about themselves as relevant, followed by contributing to the advancement of medical research (80.30%), improving the predictability of genetic testing (76.02%) and considering it fun to explore genotype and phenotype data (75.51%). Instead of focusing exclusively on health-related aspects of genetic testing and data sharing, we therefore emphasize that it is important to consider all the benefits and risks that stretch beyond the health spectrum.

In his aforementioned nature column, John Tregoning is giving the following advice: “Don't wait on your senior colleagues, and definitely don't wait until you become one. Build a network of like-minded people. Identify something that doesn't work and fix it. It can be as small as a leaky tap or as big as peer review. Idealism can be catching“. OpenSNP is going in this direction and perhaps offers new paths to the cultural accumulation of knowledge and its transmission across generations and borders. Now please share this news, so that at least our Kardashian index goes up.

Tobias Haeusermann1 is a postdoctoral researcher at the Health Ethics and Policy Lab of the University of Zürich and affiliated researcher at the Cambridge Department of Sociology.

Bastian Greshake2 is one of the co-founders of and currently pursuing his PhD in applied bioinformatics at the Goethe University, Frankfurt am Main.

