Using public sources of genomic data to answer essential research questions

On the 6th of June the third Quantitative Genomics student conference will be held at UCL. This one day event is organised by students, and is designed to give early-career researchers in mathematical and statistical genomics the opportunity to network and present their research.

The programme consists of six sessions of lightning talks covering topics such as Complex Phenotype Genetics, Epigenetics, and Methods, and three keynote speeches; by Sarah Teichmann, Richard Durbin and our very own Fiona Nielsen.

Sarah Teichmann is a leading light in the field of single cell genomics and works on an area close to my heart - T cell biology. I missed the talk she gave at my institute during my PhD, but I discussed her sequencing work on Th2 cells populations many times in my PhD thesis and I'm hugely excited to hear her speak.

I saw Richard Durban talk at the Festival of Genomics earlier this year on building and using new genome reference structures. The work he is doing will support population and medical genetics by enabling researchers to account for natural genetic variation.

Fiona Neilsen will be talking about how Repositive is addressing the most pressing problem for public genomic data: that of data discoverability. She will present case studies of how data visibility and accessibility improve research outcomes for both the data provider and the data consumer.

After reading the abstracts for the lightning talks I was hugely excited by the topics that would be covered, and also by the number of projects that have used external, public sources, of genomics data to power their research. Therefore, to get the ball rolling on #QuantGen16 I thought I would interview one of the PhD students presenting an abstract at the conference about their experiences with using external data for their research.

Using UK Biobank genotype data to examine genetic factors contributing to the relationship between depression and high BMI

An interview with Jonathan Coleman, studying for a PhD at King's College London

What scientific question are you addressing in your PhD?

We are interested broadly in what the genome can tell us about the interactions and relationships that underlie psychiatric traits. In my particular work, I have examined the contribution of genetic factors to the relationship between depression and high BMI.

Why do you use external data for your research?

External data provides a replicable, powerful source of information for answering questions. Especially when exploring a general research question, internal data is often limited in size, scope and suitability in a way that external data is not.

Why did you choose to use the data hosted by the UK biobank?

We realised a general question about the utility of genomic data in unpicking known relationships could be answered with this data. We were primarily attracted by the size, as well as the fact as a nationally-representative cohort, it provides considerably greater power for assessing our question than most, if not all, alternatives. In addition, the differences that come from combining other datasets together (to get this size of data) are less in the UK Biobank.

Can you talk a bit about the process of accessing the UK Biobank data?

The access procedure took a few months. The dataset is available to all bona-fide researchers interested in health-related research, whose research is in the public interest. To fulfil this requirement, we had to provide a justified plan for our intended analyses that demonstrated a commitment to this stated purpose. There was also a cost for application (as part of a cost-recovery process).

The application process is relatively straightforward, much like writing a proposal to perform any analysis. It was reasonably quick to completion as well, even given some complexities in our application procedure. There has been an attempt by the Biobank to avoid duplication of effort through encouraging collaboration, which I think is a great idea.

Where there any points at which you struggled or got frustrated?

No, not really! There were a few delays due to administrative issues, but they were generally sorted quickly.

What are your thoughts about the UK Biobank initiative?

As a project that is primarily concerned with creating the data resource, the Biobank has an active interest in external groups performing analyses. Our experiences with their staff have also been very positive.

The Biobank is a big opportunity to perform interesting analyses on the scale that genomics requires. It hasn't been perfected yet, but it feels like there is a real drive to simplify data access, whilst protecting the participants as far as possible. I think they're generally doing a great job in making the data available for analysis, and with several other population-level datasets already in existence, and others emerging in the near future, hopefully this style of data can be an important driver for analyses in the near future.

