Biobanks as genomic data sources


At Repositive, we are always interested in new sources of human genomic data. The idea to explore biobanks as potential sources came up in July 2015 at the HandsOn Biobanks conference, where many biobanks mentioned that they have sequencing data or plan to have it in the future.

In March 2016, we interviewed several biobanks in Europe and one in Japan in order to find out what their plans were regarding genomic data.

Interviewed biobanks

We approached the biobanks listed below with several questions about their current collections, their plans to have sequencing data in the future, and their access policies.

  1. Diabetes biobank Brussels (Belgium)
  2. National Institute for Health and Welfare (Finland)
  3. Auria biobank (Finland)
  4. Abcodia biobank (UK)
  5. Genome Denmark
  6. UK biobank
  7. ToMMo biobank (Japan)

1. Diabetes biobank Brussels, Belgium

is a non-profit initiative of the Belgian Diabetes Registry (BDR), the Beta Cell Therapy consortium (BCT), the Flemish Center for Medical Innovation (CMI) and the Brussels Institute for Research and Innovation (Innoviris). The biobank contains clinical samples and data (including some genomic data) on more than 100,000 diabetic patients and their immediate relatives. The biobank gives equal access to all academic users, usually in the context of a joint research project, i.e. researchers are expected to form a collaboration with the biobank in order to access its data.

In order to get information about available data, researchers are advised to contact the biobank directly.

2. Institute for Health and Welfare, Helsinki, Finland

is a National Public Health Institute; its biobank started about 3 years ago and is one of several biobanks in Finland. The biobank has a growing number of WES and WGS data.

They are legally required to make data available for research to bona fide researchers for meaningful projects (i.e. applications will be reviewed). The plan is to have a website where external researchers can find the metadata (like type and number of DNA samples) and a tool with which they can make searches on the anonymised data, but it is not yet implemented.

For all enquiries, researchers are suggested to e-mail

3. Auria Biobank, Turku, Finland

is Finland’s first biobank that was established in 2012. There are roughly one million human biological samples stored in Auria Biobank, a considerable proportion of which are cancer samples.

A researcher interested in samples from this biobank should submit an application that is reviewed by the board. If it is approved, the researcher receives samples and is expected to submit the raw data collected during the research or the sample-specific analysis results (e.g. sequencing data) to the biobank. The biobank enters the data in its sample and data registers, from where it can be assigned for other research purposes in future. In fact, it is a bank of both samples and data. At the moment, there is only the catalogue of samples, no catalogue of data.

To learn what samples are available, one needs to e-mail

4. Abcodia biobank, UK/US

UKCTOCS biobank was developed during the trial by the funders who funded the trial – i.e. MRC, NIHR, Eve Appeal and CRUK. Since 2011, maintenance of the biobank has been funded by Abcodia (i.e. this is a commercial biobank).

There is a detailed description of what kind of data they have and what they can offer on the biobank's website. There are no immediate plans to sequence the samples and gather genomic data, largely because of the expense of doing so.

5. Genome Denmark

is the reference genome project of the Danish population. The Genome Denmark project is based on the Copenhagen Family Biobank, which is more than 40 years old and has samples from families in 2-3 generations. The reference genome project has obtained new informed consents and taken new blood samples to construct the Danish reference genome under the highest ethical standards.
The samples have been sequenced and assembled using an approach superseding all other available genomic references of this sample size (75X coverage, different library sizes including large libraries, de novo assembly and novel software tools to secure the quality of genotype frequencies, etc).

It is part of the deliverables of the project to make the overall reference genome available to the wider research community. General conclusions from the data analysis will be published in scientific journals. The overall reference genome, variants etc will be made available through both central international data repositories and a Danish database. Individual level genomic information can only be accessed through separate data access approvals and individual genomes cannot be published. However, a new data sharing format has been developed that allows public access to browsing data including the tracing of phased genotypes down to pools of five individuals (to avoid any attempts to reverse-engineer single genomes).

For all enquiries researchers are suggested to contact the biobank directly.

6. UK biobank

is a major national and international health resource (registered as a charity), established to improve the prevention, diagnosis and treatment of a wide range of serious illnesses. In 2006-2010, UK Biobank recruited 500,000 people aged between 40-69 years from across the country to take part in this project. They have provided blood, urine and saliva samples for future analysis, detailed information about themselves and agreed to have their health followed.

All information about available data is on their website, in particular in the data showcase section. This contains documentation on how the data was collected, linked health data, how to access and return data etc. All the variables in UK Biobank can be seen through the ‘browse’ and ‘search’ functions. The following user guide is useful for showing how to use data showcase.

Genetic data is currently available for 150,000 participants and will be made available for the full cohort (n = 500,000) in the nearest future. All the details regarding what genetic data is available can be found here. There is also a search engine which allows researchers to see whether specific SNPs have been detected.

UK Biobank is an open access resource. Any bona fide scientist (i.e. from academic, commercial, charity and government organisations) can submit an application to undertake health research which is in the public good. Please see this link for a brief overview of funding and data access. More in depth information on applying for use can be found in the UKB Access procedures document.

7. ToMMo Biobank, Japan

ToMMo biobank and cohort studies have started as the national project of the Japanese government, under the budget for reconstruction from the earthquake and tsunami 2011.

The biobank is still collecting cohort specimens and data, therefore researchers can access only restricted data. In the near future, the data from this public biobank will be open for researchers as much as possible, subject to approval by the access committee.

The plan is to sequence as many as several thousand genomes or more, with the rest of 150,000 participants' DNAs genotyped with the custom array that have been developed utilising the results of the whole genome sequencing.

The biobank has already published the data with 1070 whole genome sequencing and made the variation frequency data available in English here and here. The Japanese Multi Omics Reference Panel, a database of metabolome and proteome data in plasma obtained from about 500 healthy volunteers, is also available in English.


Most biobanks are publicly funded and there are requirements to make data available to bona fide researchers. The evidence of making data available is provided by publications.

Several biobanks do sequencing and genotyping but for many biobanks it is not a priority.

Most biobanks have on-line catalogues of samples, those that have (genomic) data usually do not have data catalogues (with the exception of UK Biobank) and recommend that researchers contact them directly with enquiries.

In this post, I covered only those biobanks that responded to our questions about availability and plans for genomic data. I also did not talk about American biobanks, there seems to be a lot of development there. I plan to review them in a later post.

Any questions or suggestions? E-mail!

Read more posts by Nadia Kovalevskaya