EBI Metagenomics – an open resource for the analysis, archiving and exploration of metagenomic datasets

Special thanks to Alex Mitchell, the content and curation coordinator for the InterPro and EBI Metagenomics databases, for writing this guest blog post

The term ‘metagenomics’ describes the simultaneous analysis of the collective genomes of microbes present in a sample from a given environment, such as rainforest soil, seawater or human body site. This comprehensive genomic analysis approach can provide powerful insights into microbial community composition and function. Underpinned by dramatically falling DNA sequencing costs, metagenomic analyses have become increasingly mainstream in recent years and have been applied to a diverse variety of fields, including marine ecology, agriculture, food manufacture, bioenergy production and human health. The latter is an area of particularly keen interest, since the human microbiome appears to be important for a range of functions in health and disease. For example, dysbiosis of the gut microbiome has been linked to a myriad of disorders, including obesity 1, diabetes 2 3, cancer 4, bowel disease 5 6 and rheumatoid arthritis 7. Intriguingly, links between the microbiome and neuropsychiatric and neuropathological disorders, such as anxiety 8, depression 9, and even Parkinson’s disease 10, have also begun to emerge, potentially mediated by a microbiome–gut–brain axis 11.

The challenges of metagenomic data analysis

Despite the burgeoning interest in metagenomics, researchers often find themselves stymied, both by the sheer volume of sequencing data and by the diversity of tools with which to perform analyses. For example, a single whole genome shotgun sequencing run can yield more than 250 million sequences, representing over 100 Gb of uncompressed data. With many metagenomic experiments involving tens, or even hundreds, of such runs, data volumes can quickly overwhelm the storage capacities and analysis capabilities of individual researchers. At the same time, a quick survey of the scientific literature reveals a bewildering array of software designed for metagenomic data analysis, with over one hundred publicly available tools for researchers to choose from, but with no commonly-recognised standard analysis workflows to guide them.

Aims and scope of EBI Metagenomics

EBI Metagenomics 12 helps to resolve these issues as a freely available hub for the analysis and exploration of metagenomic datasets. It allows functional and taxonomic analyses of user-submitted sequences, as well as analysis of publicly available metagenomic datasets held within the European Nucleotide Archive. First established in 2011, and supported by EMBL, BBSRC, ELIXIR-EXCELERATE and InnovateUK funding, EBI Metagenomics has grown to become one of the world’s largest metagenomic data repositories, with over 75,000 publicly available datasets analysed using a standardised pipeline, helping support comparison of results.

The resource contains data sampled from a wide range of environments (termed ‘biomes’), ranging from insect digestive tracts to hydrothermal vents. A large proportion of the data (over 36,000 datasets) comprise microbiomes from human body sites, with this number expected to grow significantly over the coming years. EBI Metagenomics already houses the American Gut project, an extensive citizen science endeavour, aiming to analyse the microbiomes of thousands of individuals to shed light on the connections between microbiota and health. The analysis of over 8,000 sequencing runs from this project can be visualised and/or downloaded from the EBI Metagenomics web site, either on an individual run-by-run basis, or as results matrix files summarising the whole project.

Image from Spencer Phillips at EMBL-EBI

Analysis updates

The EBI Metagenomics team constantly survey new tools and resources that can improve or complement existing analyses. Thus, the analysis pipeline is updated at approximately 6 month intervals, with pipeline versions indicated on the website. Datasets analysed using older versions of the pipeline can be updated to the latest iteration, based on user request. The team also has a watching brief to ensure studies that use emerging sequencing technologies, such as Oxford Nanopore Technologies, can be analysed appropriately.

Supporting data discovery

As the number of datasets continue to grow, one aim of EBI Metagenomics is to improve support for data exploration and discovery. To this end, an API is under development to allow access to analysis results and contextual metadata. The team is currently seeking feature requests from the user community, to ensure the API can best support their needs. Another exciting development is the establishment of a formal collaboration between EBI Metagenomics and the US metagenomics portal MG-RAST 13, helping users identify and compare the analysis results for equivalent datasets in both resources. The ultimate aim is that a dataset submitted to either portal will be analysed in both, combining the strengths of the two analysis pipelines and web sites. This approach will provide complementary insights and visualisations, and provide a standard baseline for all metagenomic data analyses.

Related Collections

Human Microbiome Data


  1. A core gut microbiome in obese and lean twins. Peter J. Turnbaugh et al., Nature 457, 480-484. 22 January 2009 | doi:10.1038/nature07540

  2. Insights Into the Role of the Microbiome in Obesity and Type 2 Diabetes. Annick V. Hartstra et al., Diabetes Care 2015 Jan | doi:10.2337/dc14-0769

  3. A metagenome-wide association study of gut microbiota in type 2 diabetes. Junjie Qin et al., Nature 490, 55–60. 04 October 2012 | doi:10.1038/nature11450

  4. The Human Microbiome and Cancer. Seesandra V. Rajagopalaet al., Cancer Prev Res (Phila). 2017 Jan 17 | doi: 10.1158/1940-6207

  5. The Microbiome in Inflammatory Bowel Diseases: Current Status and the Future Ahead. Aleksandar D. Kostic et al., Gastroenterology. 2014 May;146(6):1489-99 | doi: 10.1053/j.gastro.2014.02.009

  6. Functional impacts of the intestinal microbiome in the pathogenesis of inflammatory bowel disease. Li J et al., Inflamm Bowel Dis. 2015 Jan;21(1):139-53 | doi: 10.1097/MIB.0000000000000215

  7. An expansion of rare lineage intestinal microbes characterizes rheumatoid arthritis. Chen J et al., Genome Med. 2016 Apr 21;8(1):43 | doi: 10.1186/s13073-016-0299-7

  8. Integrative Therapies in Anxiety Treatment with Special Emphasis on the Gut Microbiome. Schnorr SL and Bachner HA. Yale J Biol Med. 2016 Sep 30;89(3):397-422

  9. The Gut-Brain Axis: The Missing Link in Depression. Evrensel A and Ceylan ME. Clin Psychopharmacol Neurosci. 2015 Dec | doi: 10.9758/cpn.2015.13.3.239

  10. Gut Microbiota Regulate Motor Deficits and Neuroinflammation in a Model of Parkinson's Disease. Sampson TR., Cell. 2016 Dec | doi: 10.1016/j.cell.2016.11.018

  11. Microbes and mental health: A review. Rieder R et al., Brain Behav Immun. 2017 Jan 25 | doi: 10.1016/j.bbi.2017.01.016

  12. EBI metagenomics in 2016--an expanding and evolving resource for the analysis and archiving of metagenomic data. Mitchell A et al., Nucleic Acids Res. 2016 Jan | doi: 10.1093/nar/gkv1195

  13. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. Meyer F et al., BMC Bioinformatics. 2008 Sep | doi: 10.1186/1471-2105-9-386

Read more posts by Charlotte Whicher