Accessing dbGaP, a bureaucratic odyssey - Part 2

When I finished my last post documenting my attempts to access the dbGaP general research use data, I had my access blocked pending a small company review and was busy screaming obscenities at no one in particular.1 As I don't think there is any way I can make you want to sit through another lengthy report on the bureaucratic dbGaP process, I promise that chapter of this tale will end imminently.

In hind sight, I probably could have wrapped things up in my previous post, but at the time I had run out of patience and wasn't sure where the process would ever end. So rather than give you the excruciating details, here is a bullet point summary of the last steps of the process:

  • I waited another two weeks for the committee to meet and consider my application.

  • They approved the application (huzzah!) subject to my nominating a different IT director from my company (I had previously nominated myself).

  • I had to re-submit the application via their glorious interface after having changed the IT director. It only took half an hour or so, but required my CEO to sign off on the whole thing again.

  • Another week later, I was officially approved for access. But of course, the links to the data didn't work. Which was caused by the "pending small company review" status that I received at the end of the previous post.

  • A few emails back and forth to the dbGaP support was enough to resolve this problem and remove the restriction.

  • Finally, I had to work out how to use their specialised downloading and decryption software. Which makes it sound like their is one piece of special software to both "downloading" and "decrypting". But of course, there are two separate, non-standard and non-intuitive pieces of software to download, install and work out how to use. That doesn't seem like that big a deal, but I wasted at least half a day getting it to work.

Actually, having written that down it still seems like a lot. It certainly still chewed up a lot of time, even if a lot of it was me waiting for things to happen. So maybe I was justified in abruptly terminating my previous entry in a torrent of swear words.

All told, the whole process took somewhere in the 1 to 2 month range. Admittedly, if I had to do the whole thing again it would take a lot less time now that I am familiar with how the system "works" and have set up all the different required accounts. I made a flow diagram giving estimates of the time associated with different parts of the process, which you can see below in all of its glory.


OK, so I now have access to the dbGaP general research use data I applied for. Hooray! What was the point of all this again? For me personally, having access to the data will allow me to better test the set of tools I am building for analysing and sharing genomic data. My motivation in documenting the process (other than providing an outlet to vent my frustrations) was to illustrate the difficulties faced in trying to access data, even when the application process is "streamlined".

Which raises the larger point; research is made more difficult and less robust by the difficulty of these data access procedures. This is not a conclusion that anyone is likely to dispute. I won't speculate how this problem can solved (or at least how the difficulties with data access can be reduced), so you don't mistake this post for an exercise in corporate propaganda. I'll just be satisfied that I've hopefully made clear that there's still plenty of room for further improvement of the data access procedure.


  1. At least I was until I had my profanity censored in the name of "decorum" and "professionalism". Which was probably a futile endeavour as the underlying sentiment of "%!#$ you never ending bureaucratic process" shines through the censor's asterisk.

Read more posts by Matthew Young