Moving to human microbiome data

A metagenome is the genome of the microbiome, which is a collection of microorganisms found in a particular ecosystem. For example, the human gut microbiota is the collection of all the 'good and bad' bacteria that reside in your gut, and the metagenome is the genomic sequences of those bacteria. It is thought that each human hosts 10-100 trillion microorganisms in their personal microbiota1. Though this field is still in its infancy, more and more scientists are turning to the study of the metagenome of our microbiota to try to understand how they affect our health and wellbeing.

Image from Learn.Genetics

Why do we care?

Bas E. Dutilh, Assistant Professor in Bioinformatics at Utrecht University:

“Microbes and viruses contain a vast undiscovered biodiversity that has co-evolved for billions of years. While macrobial biodiversity is in decline, microbes hold cures for diseases and clues about the origin of complex life. Only now are we beginning to unravel these microbial secrets by metagenomics, a unique approach that samples entire microbial ecosystems in a single experiment." 

There are numerous studies highlighting the important role the human microbiota plays in pathological conditions, and it is now accepted that microbes can have a powerful influence over human gene expression 2. One major example is the role of gut microbiota in IBD (inflammatory bowel disease). Aside from genetic and immunological factors, abnormal gut microbiota has been conclusively linked to the hugely debilitating chronic inflammatory conditions including Crohn’s disease and ulcerative colitis 3. The human microbiota has also been linked to rheumatic autoimmune diseases, allergy and cancer.

But what is ‘abnormal’ microbiota? The microbiome of a healthy human varies hugely between individuals 4 and therefore large amounts of data from both healthy and sick people is required to understand how the microbiome affects us, and what makes a pathogenic abnormal microbiota.

Schematic cross-section of a gut. The lumen of the ciliated gut is populatied by trillions of microorganisms.

User stories & Challenges

Bas E. Dutilh, Assistant Professor in Utrecht:

is developing new ways of analysing metagenomes to answer questions about the identity and functioning of microbes, their interactions, and their co-evolution. Moreover, he infers the roles played by the microbes in processes ranging from human colorectal cancer to the global nutrient cycles.

“Not all data is equally valuable. It has been quite challenging, for example, to identify all human gut metagenomes ever sequenced, in order to analyse them for a particular sequence of interest. This can be due to anything from bad metadata annotations provided by the people who generated the data, to corrupt files in the public data repository. Before recycling data that has been generated by others, we always run a couple of sanity checks.”

Liam, PhD Student in the UK:

is looking for datasets that have both the genome data and metagenome data of individuals to validate his findings.

“Aside from data that has been shared in collaborations, I really struggle to find this kind of data online because the genome and metagenome data is often uploaded in different repositories alongside different publications, so linking the two is very hard.”

Mariana, PhD Student in Sweden:

is creating a catalogue of mobile elements in bacteria to understand the effects of these mobile elements in nature. Therefore, she needs a lot of data to test her models and create this catalogue.

de novo assembly is hard and time consuming, so ideally I need assembled data, which is not always easy to find.”

Human Microbiome data

As mentioned above, the study of the human microbiome is still a new field, and therefore there are very few repositories dedicated to storing metagenomic data, and as a result the data is fragmented across many different repositories.

The Repositive platform now contains over 21,000 datasets from human metagenomic studies, all in one place (Browse). The largest proportion of these come from studies that have deposited their data in SRA (Browse): a major study within the SRA datasets is the Human Microbiome Project 4 (further details on this project can be found here). However, we also have data from 6 other sources, including dbGaP (Browse) and GigaDB (Browse).

View the Microbiome Data collection

Interestingly, 74 samples are from individuals who have chosen to upload their personal microbiome data online. 72 of these are from people who have submitted their data as part of the Personal Genome Project (Browse). The two other samples come from Steven Keating and The Corpasome.

Overview of bioinformatic methods for functional metagenomics. Modified from Morgan X. C. et al. 5

Steven Keating's microbiome

Steven Keating is a PhD student at MIT who was diagnosed with a brain tumour in 2014. After the tumour was removed he became interested in open-sourcing his clinical data to drive learning through sharing. For more information you can read the DNAdigest interview with Steven from earlier this year. Alongside Steven’s 23andMe and whole genome sequencing data (Browse), Repositive has also indexed his gut microbiome data (Browse). His microbiome sequencing data is from uBiome, and is dedicated to understanding the effects of chemotherapy on the gut microbiome.

The Corpasome

Our very own Scientific Lead, Manuel Corpas, spent 6 years trying to understand his personal genome and the genomes of his direct family members 6. More detail about Manuel’s journey can be found in his guest post in DNA Digest. Alongside the 23andMe and whole exome sequencing data from The Corpasome (Browse), Repositive has also indexed the gut microbiome data of his son (Browse).

"In 2009, the Corpas family decided to take the unprecedented move of publishing their exome and microbiome data and analyses on the Internet under a CC0 license waiver, the least restrictive type of license.


As part of this collection we also currently have 82 datasets (Browse) from the repository EBI Metagenomics. This is one of very few existing repositories that is specifically dedicated to metagenomic data. If you know of more please tell us and we will index them! This repository is currently in its infancy and therefore doesn’t contain that much human data, however, since August 2016 fifteen new human datasets have been submitted, highlighting that it’s growing fast.


The future of metagenomics is bright and the use of microbiome in diagnostics, treatment and personalised medicine will undoubtedly become standard clinical practice.

Non-invasive sampling methods [of the microbiome] and decreasing profiling costs make it a feasible avenue for early diagnosis and patient stratification7.

Analysis of the microbiome may help with the stratification of patients, but we are still a long way from using the profile of an individual's microbiome to predict susceptibility to disease. However, due to the constant shifting of the microbiome, in the future, not only could the microbiome be used for diagnosis, stratification and risk assessment, it could also be used for follow up and re-evaluation of patients. Additionally, there is great potential for modification of the microbiome to become a form of treatment. Faecal transplants to treat C. difficile infection have successfully proved the potential of modifying the microbiome to treat disease 8. Furthermore, we are just starting to understand the role of host bacteria in the metabolism of drugs 7.

Microbiome and Precision Medicine. Modified from Zmora N. et. 7


  1. Revised Estimates for the Number of Human and Bacteria Cells in the Body. Ron Sender et al. Plos Biology. August 19, 2016. doi:10.1371/journal.pbio.1002533

  2. Role of gut microbiota in the control of energy and carbohydrate metabolism. Venema, K. Curr. Opin. Clin. Nutr. Metab. Care 13, 432–438 (2010) | doi: 10.1097/MCO.0b013e32833a8b60

  3. Immunopathogenesis of IBD: current state of the art. Heitor S. P. de Souza & Claudio Fiocchi. Nature Reviews Gastroenterology & Hepatology. 13, 13–27 (2016) | doi:10.1038/nrgastro.2015.186

  4. Dynamics and associations of microbial community types across the human body. Tao Ding & Patrick D. Schloss. Nature 509, 357–360 (15 May 2014) | doi:10.1038/nature13178

  5. Biodiversity and Functional Genomics in the Human Microbiome. Xochitl C. Morgan et al., Trends Genet. 2013 Jan; 29(1): 51–58 | doi: 10.1016/j.tig.2012.09.005

  6. Crowdsourced direct-to-consumer genomic analysis of a family quartet. Manuel Corpas et al. BMC Genomics 2015 16:910 | DOI: 10.1186/s12864-015-1973-7

  7. Taking it Personally: Personalized Utilization of the Human Microbiome in Health and Disease. Niv Zmora et al., Cell Host & Microbe 19, January 13, 2016 | doi:10.1016/j.chom.2015.12.016

  8. A Promising Pill, Not So Hard to Swallow

Read more posts by Charlotte Whicher