Submitting genomic data to repositories: Gene Expression Omnibus - GEO

GEO Submissions Overview

GEO is a public access repository for expression data, including RNA NGS and array-based data, hosted by the NCBI. Initiating a GEO submission typically requires compiling all raw data, processed data and a metadata spreadsheet into a folder. These data are then transferred to GEO using FTP. For the metadata spreadsheet, submitters are required to describe their study, list all samples, their properties, associated raw data and processed data files, describe experimental protocols, and elaborate on data processing steps and programs used to analyse the data, as well as parameter settings.1

After data has been successfully transferred, submitters then send an email to GEO to initiate a review of the submission. A GEO curator will examine uploaded files and provide feedback, and a GEO accession number will be created. At this point, submissions remain private, though a reviewer link can be created to share submissions with journal reviewers. Once manuscripts associated with submissions have been accepted for publication, submitters can return to their login page and make the data public.1

Thoughts on the GEO submission process

As GEO is one of the most popular repositories for submitting public access data to, I have spoken with multiple individuals about their experiences with submitting data to it. One of the major positives I found was that people felt that the GEO data submission team were responsive to emails (within 2-3 days maximum) and were very helpful. GEO usually assigns a specific individual to each submission, which submitters found helped streamline conversations and was particularly useful.

Time scale: One user found that formatting the data and filling all the online forms took about 2 days. Though this was with the descriptions for the forms already written. If she had done it from scratch she estimated it would have taken her around a week.

"They were flexible with issues which we faced as a result of our own problems."

GEO has archive templates and examples corresponding to multiple microarray platforms. However, often users found that the proprietary software they used did not export the data in a GEO compatible style, and therefore required reformatting of the data. To help those submitting array data to GEO, Alexandre Kuhn has written a Bioconductor package called GEOsubmission that helps one submit microarray data and the associated sample information to GEO by preparing a single file for upload. Furthermore, if you use InSilico DB for data management then they will submit your data for you on your behalf.

Multiple individuals found that their genomics core facility did not give the raw data, which is required for the GEO submission, but rather the background subtracted data. Furthermore, many of the sections of the GEO submission form required information that the core sequencing facility had (i.e. base calling method). Therefore, to complete the GEO submission form requires much toing and froing between the sequencing facility and the data submitter, which inevitably resulted in a lot of frustration.

"Ideally the submission would be split into three segments, that different individuals could complete. A segment for:

  • The researcher who created the samples.
  • The researcher who sequenced the samples.
  • The researcher who analysed the data."

Finally, it is interesting to note that on GEO each user can only have one affiliation, so if a user changes their affiliation then it will change on all their historical submissions. Therefore, one has to make a new user profile each time one moves institutions.

"It would be better for submissions to be done under the Principle Investigator's name, as they are the most secure individuals associated with the data."

