Overcoming Data Analysis and Sharing Challenges to Facilitate the Advancement of Science

Brigitte Ganter interviews our CEO Fiona Nielsen after meeting at the BlackBox Connect Program in San Francisco in May 2016.

This interview was originally published on the EnlightenBio blog on June 27, 2016


Repositive Wants to Overcome Data Analysis and Sharing Challenges to Facilitate the Advancement of Science



Last month I had a chance to meet Fiona Nielsen, CEO of Repositive, when she was visiting San Francisco for the BlackBox Connect Program. I took this opportunity to learn more about Repositive, the platform the company has built, its intended application, and why data sharing is so important. This blog addresses genomic data questions related to data sharing, challenges encountered with analysis and sharing platforms, and what Repositive is focusing on to mitigate these issues.

The following summarizes questions and answers from my dialogue with Fiona Nielsen.



EB: What are some of the biggest challenges when it comes to working with genomics data both in the research and the clinical setting, and what are suggested solutions to address these challenges? What are the promises and challenges of sharing clinical and research genomics data?

FN: All research in a data-intensive science is challenging and hard work. There are high demands put on research data management and efficient analysis tools. In addition, within genomics linked to studying human genetic diseases, you have to deal with the extra complexity aspect of PII (Personal Identifiable Information). Genomics data may have unique identifiers (including the genome sequence) for an individual and may be very rich in sensitive data such as clinical information/electronic health records. These challenges need to be addressed up-front by all research institutions when planning and budgeting for data management and information governance.



EB: Why are large-scale genomics sharing and collaboration efforts needed? How are you and Repositive involved in this effort?

FN: The complexity of the human genome implies that any research question will need a lot of data evidence to rule out bias and random correlations. To make sense from any research project and to address specific hypotheses, there is a high demand for specific reference data that needs to be accessed. For instance, when researching a specific disease, one will want to validate and compare any results with “other data” of the same disease, data of healthy individuals, data of related diseases, data of the same population or different population, and more. In 2002, one could get a Nature paper by publishing the findings from one genome. Today we understand to make sense of data, one needs to cross-compare and validate findings to existing reference data now available in the community. Unfortunately, a lot of these data is not publicly visible or available, which means that it is a necessity to collaborate with other research groups to make the most of the data that has been siloed in different locations. Ideally, researchers need to adapt a change of mindset, to publish data more often than publishing papers, so the data will be available to individual researchers and the community for their analyses without delay.

I run both, the charity DNAdigest and the mission-driven enterprise Repositive. At DNAdigest we promote best practices for genomic research data sharing through public events and workshops, and we give visibility to data sharing projects from across the globe on our blog. At Repositive we help researchers to find and access human genomic data via our free online platform for searching data sources from around the world.



EB: What intercontinental challenges are there and how can they be addressed – Repositive is a UK company, yet a large amount of data is produced in the US?

FN: The research community is international and all useful tools in this space are available online and internationally from the onset. The challenges for data access and data reuse result – among other things – from regulations for data privacy which differ from country to country. An odd implication associated with this is that many countries do not allow export of genomics data, which can be a serious hurdle for international collaborations. Repositive is indexing data sources from all over the world, to simplify access regardless of the location of the data source. Via our free online platform individual researchers can then identify their data source of choice that they want to include in their analyses. Specifically, they receive information where the data is located and how one get access to the data.



EB: What is the ideal data storage, analysis, and sharing platform for genomics data?

FN: There is no one single right answer to this question. There are many providers of data management tools for genomics data, and even more providers of data analysis platforms. All of them incorporate in some form or other data sharing, at least within the platforms themselves. The important take-home message is that “one must use a well-designed data management tool, regardless of what type of data analysis is performed” so that no data gets lost, data consent is tracked and managed, and all data can be used and reused for maximum benefit for the patients who allow their data to be used in the first place.



EB: Who exactly is the audience of Repositive?

FN: The Repositive platform is useful for all researchers who use human genomic data in their work. The goal is to help anyone who is looking for a specific type of human genomics research data to locate it and connect them to the right source.



EB: What key message do you want to share with your community?

FN: I have three key messages that I give to all genetics researchers I come across:

  1. Get credit for your data – publish your data independently of your research papers to make it available for the scientific community and get cited when other researcher use it.
  2. Give credit – cite the data sources you use. Giving credit to the hard work of your fellow researchers is a main driver for more researchers taking the time and effort to make their data available.
  3. Understand consent – you have no excuse to not understand what consent was given for the data that you are working with. If you do not understand the consent for your data you do not know if you are breaching the consent, and you do not know if or how you are allowed to share the data with fellow researchers.



Many thanks Fiona for your insightful answers and comments, and best of luck to you and Repositive.

Read more posts by Craig Smith