How to successfully apply for access to dbGaP

I have previously written at length about my frustrations with the dbGaP application process. Which highlighted some of the many difficulties and pitfalls in the application process. However, my previous rants probably don't provide the most useful reference for successfully negotiating the dbGaP application.

So, in an effort to contribute constructively, here is my attempt at providing some practical guidance in applying to dbGaP and accessing the data once your application is accepted. This will only help you avoid some of the bureaucratic and technical problems. You'll still need to write a research proposal that can convince the data access committee to give you access. You're on your own with that.

I'll also tell you a bit about what you need to do once your application has been approved to actually get at the data in the dbGaP system. Because that's anything but straightforward as well.

How long will it take?

That depends on how many extra things you need to collect before making your application. The chart below gives a (very) rough estimate of the times involved. Refer to the next section if you don't understand what the different parts of the process are.

So it'll take a while..

What you'll need

Before you start, you need to work out what credentials you need and which ones you already have. If you read through the list and have everything, you can skip straight to making your application.

  • You need a personal (i.e., issued to you, not your organisation/company) NIH eRA Commons login.
  • If you don't have one, you need for your organisation to be registered with the eRA.
  • If your organisation isn't registered with the eRA, your organisation will need a DUNS number.

My organisation doesn't have a DUNS number and now I'm sad

It's appropriate that you're sad, as DUNS is derived from the gaelic word for deep unending sadness1. At any rate, if you don't have a DUNS number you probably don't know what it is, so let me enlighten you. A DUNS number is a: "Dun & Bradstreet number, a unique nine digit identification number for each physical location of your business, required to register with the US federal government for contracts or grants".

To apply for one go here and have someone with enough authority in your organisation fill in the web form. Then wait for someone to email the DUNS number back (in my case, this took about a week).

My organisation isn't registered with the eRA

Your organisation has a DUNS number though right? If not, go back a step and get one first.

Remember that what you're trying to do here is register your organisation with the eRA, not yourself. Because of this, you're going to need someone at the top of your organisation (like the CEO) to sign things for you. With that in mind:

  1. Go here and fill in the online form (or get someone to fill it in for you).
  2. Print out the filled in form when prompted to do so.
  3. Fax the form to the number on the form (yes really, you have to fax it).
  4. Wait for someone to send the eRA login for your company (it took two days for us). Again, as this is the organisation that is registering, the email will probably go to your CEO (or equivalent).

I don't have a personal eRA Commons login

Make sure your organisation is registered with the eRA. If it's not, go back to the previous section.

You'll need to ask whoever has access to the organisational eRA login to register you as a user and give you a password. You might want to warn them that the eRA control panel has a less than intuitive interface 2. I think you have to be marked as a "Principal Investigator" by whoever is doing this, although I'm not 100% sure about that.

Making your application

You should now be ready to actually log in to dbGaP and submit an application. The first step is to go here and click the "log in" link.

NOTE: If your log in doesn't work and you've only just been registered as an eRA user, just wait a day or two. Although dbGaP uses the eRA credentials to authenticate users, there is a lag in copying across credentials. I'd guess this would also apply if you had done something like change your password recently. dbGap has a whole section documenting log in issues.

Once you're logged in, click on "Create New Research Project". Now you have to fill in the actual application form. This part was relatively straight forward for me, but I was writing a pretty "standard" application. I'm sure there are many pitfalls and things to watch out for in filling in the application form, but as I didn't encounter them myself I can't offer specific advice here.

The one piece of trivia I can impart is that they don't want the "nominated IT director" to be the same person as the "Principal Investigator". It's worth avoiding these little things the first time round, because if they're unhappy about something, you have to edit and resubmit your application. Each time you do this, someone senior in your organisation needs to re-approve your request. Which can potentially take a while.

Anyway, once you have submitted your application, you need to wait for the review committee to meet and decide if they're going to approve your application or not. For me this took about 2 weeks, but I have no idea if this is typical.

Accessing the data

Congratulations! Your application has been approved! At least I hope it has. You can now go back here, log in and see your approved project!

Now you're probably excited to download your data and start doing science. However, even though your application has been approved, you still have a way to go before you actual have the data in a usable form. The first thing you need to do is to create a "data request".

Creating a data request

Click on the "My Requests" tab. You should see your (now approved) project listed and there should be a "Request Files" button on the right in the "Actions" column. Click that button, select the files you have been approved for and want to download and hit "Create download request".

Now wait some more. You didn't think that the waiting was over did you?

After some period of time (I think it took about half an hour for me), your "download request" will be approved and will appear under the "Downloads" tab. Click on the download link and...

Be greeted not with a link to a file, but more instructions. dbGaP uses a specialised piece of software to download the data, so you can't just click a link and have it download.

Downloading the data

Although dbGaP gives you instructions on using the AsperaConnect software to download the data, I found them pretty unhelpful.

The ascp manual command it gives was broken in some way. I know because I managed to fix it by changing some things, but I can't remember exactly what I did. Sorry I can't be more helpful here, but I don't have a record of what I changed.

The easy3 way to download the data is to install the AsperaConnect browser plugin. The dbGaP page redirects you to this page which has a bewildering array of options to choose from. Most of which require a password to access, which of course you don't have.

The one I used was just called "aspera connect" and I got it from this page. It's not a normal browser plug in, in that I had to download a binary file and install it like a program. It forced me to close all my browsers in the process, so be aware of that.

Even after doing all this, it never worked for me using google chrome. It did work in firefox though. By worked I mean that after I installed it, when I clicked on the download link in dbGaP, as well as a bunch of instruction I got a link that when clicked opened the aspera connect software and started the download. Hooray!

Decrypting the data

So I bet you think you're done now. Well guess what! The data you've downloaded is currently useless. Because it's encrypted. To actually make use of it you have to decrypt it. To do this, you need to:

  1. Download and install the SRA toolkit. There are instructions for how to do so here.
  2. Configure the SRA toolkit. Once again, refer to the instructions. I found these instructions difficult to follow. So here are the key steps I think you need to take:
    1. Download your dbGaP repository key from the "My Projects" tab by clicking "get dbGaP repository key" on the right. Save it somewhere sensible.
    2. Start the SRA toolkit config (by running /path/to/vdb/binaries/vdb-config -i on linux/unix/osx).
    3. Import the dbGaP repository key (the thing you saved in step 1).
    4. When you import the key, it will ask if you want to "change the location". The software makes a special "project" directory (by default something like ~/ncbi/dbGaP-3452 or similar). It will only be possible to decrypt files when running commands from this directory. So take note of it.
  3. Open a terminal and navigate to the "project directory" you set (or had set for you) when you configured SRA toolkit. If you don't do this, then decryption will not work.
  4. Run /path/to/vdb/binaries/vdb-decrypt /path/to/the/downloaded/file/to/decrypt/target_file.ncbi_enc from this directory.
  5. If you've followed all the magical incantations needed to appease the SRA toolkit software properly, target_file.ncbi_enc should now be replaced by just target_file and should be decrypted and ready to use.

Repeat this process for all other files you've downloaded and need to decrypt. Now you should be ready to use the data to do something useful. Good luck!

Footnotes

  1. It's not really.

  2. Translation from polite British English: It sucks.

  3. Not at all easy.

Read more posts by Matthew Young