Submitting genomic data to repositories: ArrayExpress

ArrayExpress Submissions Overview

ArrayExpress is a public access repository hosted by the EBI and accepts all kinds of functional genomics data including, microarrays, expression data, methylation arrays, SNP arrays, RNA-seq, ChIP-seq, HiC-seq, etc. They also broker sequencing (NGS) data to ENA-SRA on behalf of submitters.

The main route for direct submissions to ArrayExpress is via the submission tool Annotare. This tool was released in 2014 and is optimised for supporting microarray and HTS-based data submissions. Users supply their files and metadata in a webform, from where the data is exported and stored in MAGE-TAB format. Annotare uploads the data files from the submitter's directory and captures experimental metadata through a series of spreadsheet-based web forms, guiding the submitter step by step when constructing a submission. Then there is a validation step that aims to catch errors such as missing data files for an assay, or the absence of attributes for samples, at which point the submitter can make amendments. After validation, Annotare generates MAGE-TAB files, which contain the experiment's metadata, and submits these together with the data files to ArrayExpress, where the accession number is provided to the submitter.1

Thoughts on the ArrayExpress submission process

Tell me about your experiences with submitting data to ArrayExpress, comment below and get the discussion going!

"Pretty straightforward"

People generally felt that it was straightforward to create templates in ArrayExpress. However, multiple times we heard the complaint that there was no option (at least it was not obvious) to add additional metadata beyond what was offered in the forms. This meant extra analysis or gene lists could not be appended. Furthermore, it meant that later findings or additional papers using that data could not be added retrospectively after submission.

Time scale: One user said the whole submission process, including formatting and upload, took him about 2 hours.

GEO Vs. ArrayExpress

"I prefer GEO, since it is quite simple and depending on your microarray platform they provide corresponding archive templates and examples."

"ArrayExpress is more smooth, they require less detail than GEO. Within ArrayExpress, there are more boxes to fill rather than forms, which makes their requirements more clear and leaves the user less confused."

"ArrayExpress has a really good curation system - GEO has nothing like that. But ArrayExpress is asking for different data in a different way, so maybe it's easier for them" (at ArrayExpress compared to GEO).

ArrayExpress cares about feedback

Amy Tang, Curation Project Leader at ArrayExpress:

"We can't agree more with you that "the best way to make the data submission process less of a nightmare is to be vocal", so we've been actively collecting free-text feedback from submitters for every single submission, and they have been very generous in sharing their comments, especially frustrations and pain points encountered during submission, which we then address in our development cycles."

To see the 1200 Annotare submission satisfaction scores on an interactive world map Click Here. This was created by Robert Petryszak, the Gene Expression Team leader at the EBI.

The ArrayExpress, ENA & EGA 'Vision of an ideal submission process':


Related Blog Posts

Submitting genomic data to repositories: a necessary nightmare?!

Submitting genomic data to repositories: Gene Expression Omnibus - GEO

Submitting genomic data to repositories: European Genome-Phenome Archive - EGA

Submitting genomic data to repositories: Sequence Read Archive - SRA


References

Read more posts by Charlotte Whicher