Express your research with expression data

Gene expression is the process by which information from a gene is converted into a functional gene product. The functional product of coding genes are proteins. The product of non-coding genes are functional RNAs.

Image credit to Madeleine Price Ball. DNA --> RNA --> protein, illustrating the genetic code of the first few amino acids of the alpha subunit of haemoglobin.

Nearly every cell within an organism carries the same set of genes, however very few of these genes are used at any given time. Different cells use different genes at different times - this is what makes cells different from each other and what allows just one cell to differentiate into a whole complex organism. Studying the expression of genes can tell researchers what genes are 'turned on' at a certain time, within a certain cell type or tissue. Analysing gene expression is important because it allows researchers to unveil new information and find treatments by altering cellular pathways. You can read more about how gene expression is analysed in this Nature Education article.

There are now huge amounts of expression data being created and published every month. This data is hugely valuable and therefore Repositive has tried to bring many sources of this data together to enable researches to find the data they need via one easy-to-use portal. Browse our expression data collection here.

Public Repositories


ArrayExpress (Browse) is a public access repository hosted by the EBI that stores microarrays, expression data and many types other of functional genomics data, including methylation arrays, SNP arrays, RNA-seq, ChIP-seq, HiC-seq, etc. However, as the name suggests, over 80% of all human datasets on ArrayExpress are 'RNA assays'.


GEO (Browse) is a public access repository for expression data, including RNA NGS and array-based data, hosted by the NCBI Almost all human datasets on GEO are 'expression profiling by array'.

Allen Institute

The Allen Brain atlas (Browse) comes out of the Allen Institute for Brain science. The Allen Institute aims to answer some of the biggest questions in neuroscience and accelerate research worldwide through public releases of new data, knowledge and tools. The Allen Human Brain Atlas maps gene expression across the human brain, by integrating anatomic and genomic information, available data modalities include magnetic resonance imaging (MRI), diffusion tensor imaging (DTI), histology, and gene expression data derived from both microarray and in situ hybridization (ISH) approaches.

A coster I have in my flat!

Added-value data sources

Added-value databases was a term used by Johan Rung and Alvis Brazma in their 2012 Nature Reviews Genetics publication 1.

Added-value databases extract information from primary data to answer questions and make the answers available through user interfaces that are tailor-made for genes, diseases or other direct biological or biomedical questions1

Indrek Vainu, Co-founder of Xpressomics:

"The problem today is that it is difficult to find relevant information about genes from already conducted experiments. PubMed or Google searches often reveal only a handful of publications where the genes of interest have been mentioned. However, there are vast amounts of information hidden in files from large-scale gene expression profiling studies that sit unanalysed in public repositories."

At Xpressomics (Browse) they are solving this problem. They unlock this hidden information through careful manual annotation of experiments and detailed differential expression analysis to reveal relevant information from mountains of publicly available experimental data.

"Our interest really is to enable scientists to reinterpret their results in the light of all other experiments ever made. By searching through previously conducted gene expression experiments users can discover information about specific drugs, conditions or triggers, which induce or repress genes of interest in a statistically significant manner. This helps researchers to hypothesise about the possible regulatory mechanisms and the functional significance of the genes they are studying." Indrek Vainu

Researchers from the University of Tartu, Institute of Biomedicine and Translational Medicine, used Xpressomics to identify drugs that activate the hypothermia-responsive gene Cirbp. This is important as therapeutically induced hypothermia can be an effective treatment for various hypoxic and ischemic conditions 2.

InSilicoDB (Browse) aggregates human, rat and mouse microarray and RNA-Seq data from large public repositories, like GEO and ArrayExpress. This data is all uniformly processed and formatted, enabling streamlined analysis and comparison, using the multiple analysis tools that are interrogated into their platofrm.

Furthermore, InSilicoDB acts as a foundation for community-compiled datasets, which have been used in high impact publications 3.

"We thank Alain Coletta and Virginie de Schaetzen from InsilicoDB for assembling a melanoma microarray data compendium, which was used as a starting point in this study."

View the Expression Data collection

There’s more!

These are only a few of the multiple resources now becoming available for researchers to gain access to expression data.

Stay tuned for more blog posts on what resources are out there for other genomic assay types, technologies, rare diseases and common diseases!

For more details about the resources discussed above and how to access their data, sign-up to Repositive.


  1. Reuse of public genome-wide gene expression data. Johan Rung & Alvis Brazma. Nature Reviews Genetics 14, 89-99 (February 2013) | doi:10.1038/nrg3394

  2. Estimating differential expression from multiple indicators. Ilmjärv, S., et al. Nucleic Acids Res. 2014 Apr; 42(8): e72 | 10.1093/nar/gku158

  3. Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state. Verfaillie, A., et al. Nature Communications 6, Article number: 6683 (2015) | doi:10.1038/ncomms7683

Read more posts by Charlotte Whicher