Submitting genomic data to repositories: Sequence Read Archive - SRA

SRA Submissions Overview

The SRA is a public access repository for raw sequencing and alignment data from NGS methods, hosted by the NCBI. The EBI equivalent of the SRA is the ENA. The SRA is the NCBI's main repository for high-throughput sequencing data and is part of the international partnership of archives (INSDC) between the NCBI, the EBI and the DDBJ. Data submitted to any of the three organisations is shared between them.

SRA provides an online submission portal for the submission of 500 samples or less. For studies with greater than 500 samples, you are required to perform multiple submissions under the same BioProject reference.

The SRA submission process involves three main steps, which are detailed in the following diagram:

Initially one must set up an SRA account.

  • The first step of the submission process involves registering the BioProject and BioSample information.
  • The second step involves importing BioProject and BioSample accession numbers into SRA and providing technical information about the methods used to sequence the samples.
  • The third step is to submit the data files, which involves designating an SRA Run for the files and then sending them to an FTP. However, users with files >1GB or located outside the USA are asked to install Aspera Connect and then contact the SRA directly for instructions on how to send their files.

Thoughts on the SRA submission process

Tell me about your experiences with submitting data to SRA, comment below and get the discussion going!

"It's a painful process, and the online tutorials aren't very useful."

However, most people agreed that the SRA staff were very good and patient at dealing with help emails. One user shared this thought: "they must get so many emails asking the same basic things!"

"Although SRA provides an online submission portal, it is often more convenient to submit short sequence data (especially for large studies) using GEO."

“It’s a nightmare as you have two things. You have the BioProject frontend (which is the metadata) and the SRA sections (which is the raw data), so you have to weave this information together. No one knows what’s going on.”

Though it seems SRA are very willing to make changes to fit users' data submission needs. One user I interviewed found that he couldn't upload his data as the SRA internal data structure did not support submitting only metagenomic assemblies. However, after 2 weeks the SRA team had changed their structure to support his needs.

Time scale: users said that on average it took them about 1-3 days each time they submitted data to SRA. However, many said that it takes as much as a week to just read all the instructions and be clear about what you have to do. Furthermore, there is often the need to email the SRA, which resulted in emails back and forth and more time lost.

“I found it hard because the input fields in the forms are confusing, some are not required and some duplicate information – this results in the data being inconsistent in the database.”

“If you can’t programme it must be awful. I have no idea how you would do it!”

If you are thinking of submitting data to SRA, to help you out here is an online tutorial and a blog post that will guide you through the SRA submission process. There are also two YouTube videos: one detailing the terminology used by the SRA and one walking you through sequence submission to the SRA.

Related Blog Posts

Submitting genomic data to repositories: a necessary nightmare?!

Submitting genomic data to repositories: Gene Expression Omnibus - GEO

Submitting genomic data to repositories: ArrayExpress

Submitting genomic data to repositories: European Genome-Phenome Archive - EGA

Read more posts by Charlotte Whicher