Accessing dbGaP, a bureaucratic odyssey

Reading about navigating bureaucratic hurdles seems like a 21st century version of “watching paint dry”; a dull and unrewarding activity to be avoided if at all possible. So why should you keep reading? I could try and convince you that this particular bureaucratic exercise, the procedures researchers need to navigate to gain access to databases of genetic variants, is interesting because of its importance. This is certainly true. We all have an interest in the advancement of medical research and seeing it hampered by unnecessary procedural obstacles is simultaneously maddening and depressing. This is doubly sad when you consider that the users’ who consented to make their data publicly available did so in the hopes of their data being used for the collective good. But the real reason to persevere is that the frustrations of the dbGaP access process are deeply familiar to anyone who has had to deal with an inflexible institution (i.e., everyone). So read on, share in my pain and frustration, and perhaps we can find some kind of catharsis in collective outrage at the absurdity and difficulty of the dbGaP application process.

To give a bit of context, dbGaP or the “database of Genotypes and Phenotypes” is a database of genetic and categorical information about individuals. That is, it contains peoples’ DNA sequence and information about what conditions they may have, their age, etc. This information is invaluable to researchers as it can be used to discover the molecular cause of diseases, find populations at risk for different conditions and many other applications that I haven’t thought of. The individuals whose data make up this database have given permission to the research community to use this information, in the hopes of these positive benefits materialising. In particular, there is a subset of the data that is marked as for “General Research Use”, which should be easily available to researchers. In fact, the NCBI (who maintain dbGaP) recently triumphantly declared that they had streamlined the process of applying for access to use these general research use (GRU) data.

I am building tools to help share genomic data and the dbGaP data would be invaluable for testing and developing these tools, so I decided to attempt to gain access myself. The starting point for the access process is this page. The brief description found here certainly gives the impression of a streamlined process which should be over fairly quickly. At this point I was thinking, “Great! This won’t be as hard as I had feared”. And so my long conversation with the faceless bureaucracy began…

dbGaP: Click on the “log in” link.
Me: Okay, I think I can handle that, what’s next?
dbGaP: Log in with your eRA commons or NIH login.
Me: Errr, I don’t have either of those, but decades of internet use has made me an efficient account registering machine, so let’s see if I can make an account.
dbGaP: Are you an extramural principal investigator or an intramural NIH scientist?
Me: What the hell does extramural/intramural mean? Let’s see if google knows. Apparently extramural means “people who are not full-time members of a university or other educational establishment”. While “principal investigator” seems like an overly grandiose term for “some guy looking to play around with some data”, I guess “extramural principal investigator” is as close as I’m going to get.
dbGaP: Extramural principal investigators should click here to register a new account.

At this point I was expecting to be redirected to a standard form asking me to choose a user name, enter an email address and supply some other personal information that you get the impression they don’t really care about as long as they have your email address.

XKCD - The important field

Of course, this was not the case. The whole “sign up for an eRA account” process that I’m about to inflict on you is really indicative of the entire process of trying to apply for access to dbGaP. It looks like it should be easy, but it’s only really easy if you meet the fairly restrictive assumptions about who you are and where you’re applying from. For the eRA sign up, it’s assumed that you’re a US citizen who is working for a large institution who has frequently needed to apply for grants/bid for contracts from the US government. It’s not that the process outright excludes you if you don’t meet these requirements, but it certainly doesn’t make it easy for you.

The first major impediment I encountered is that the eRA/NIH/dbGaP assumes that even if you personally don’t have an eRA account, surely the institution you work for does. As I work for a small company in the UK, this is not the case. Unfortunately that meant that before I could even apply for an eRA account for myself, the organisation I worked for had to register with the eRA. So who can do this? Apparently only the CEO. Blerg!

You might quite rightly object that applying for access to dbGaP as an employee of a small company is actually an uncommon situation and it’s reasonable that the dbGaP people haven’t streamlined the process for my case. This is a fair point, but I’d point out that for me getting the CEO to apply for access meant rotating my chair and asking her to do it. I can’t even begin to imagine how painful a task it would be if you happened to be the first individual of a larger organisation to need to apply for access. Anyway, moving on.

eRA: To register your organisation with the eRA, please fill in this form.
Me: Finally! Generic form filling time, now we’re in my comfort zone. Wait, what the [EXPLETIVE DELETED] is a DUNS number? We don’t have a DUNS number! Fiona, [our CEO] do we have a DUNS number? No? OK then, further down the rabbit hole we go…

It turns out that a DUNS number is a: Dun & Bradstreet number, a unique nine digit identification number for each physical location of your business, required to register with the US Federal government for contracts or grants. Fan-bloody-tastic. How do I get one of these for our company then? Thankfully this was a fairly simple web form. Some of the questions were a bit bizarre (is your company “woman owned”?), but whatever.

About a week later, we got an email containing our DUNS number. It had now been 2 weeks since I started trying to apply. Granted I had other things to do, our CEO wasn’t always immediately available to fill in forms and so on. So in principle I could have gotten to this point in somewhere between 1 and 2 weeks. Still not great if you need access urgently and I still hadn’t managed to log in to dbGaP and start the actual application process proper. But I finally felt like I was making progress. Armed with DUNS number I returned to the eRA.

Me: Hi eRA I have your stupid DUNS number, give me an account.
eRA: Fill in this form.
Me: OK, I’ve done that, let’s hit submit.
eRA: Please print out the form, have your CEO sign it and fax it this number.
Me: Really? Fax it? Are you going to keep me updated on my application by sending messages to my pager?!

After a fun half hour of the reception staff and I trying to work out how to send a fax (surprisingly they’d never had to send one before, it’s almost like faxes are outdated in 2016), I had sent our eRA account application. A mere two days later we received an email with eRA login credentials for our company. Amazing. Note that what I’d achieved at this point was to register the organisation with the eRA. To apply for access to dbGaP I needed a personal account.

My expectations were now set so low that I would describe the process of registering a personal account as “relatively straightforward”. I logged into the eRA as an organisation, went to user/employee control panel type thing, created a new user and designated that new user (me) as a “principal investigator”. Why was this only “relatively” straightforward? Because the eRA control panel interface is the most non-intuitive and terribly designed thing I’ve ever had the misfortune of using (it was a pain even for a self-proclaimed form filling ninja such as myself). My personal favourite part of the experience was when the entire style and design of the web-site changed between two parts of the same control panel.


Now I’m sure there’s a reasonable explanation here about a style update in progress, or different people working on different parts of the site. But I prefer to imagine that the style of the website was a deliberate design choice, made by someone who was overwhelmed with a sense that the universe was a cruel and random place, indifferent to his desires and struggles and who wanted to instill the same sense of despair in the users of his website. I have to find some way to make the process interesting right?

Progress, at last.

At long last I was ready to make my application for access to dbGaP. I returned to the page from which I began my crusade, clicked log in and entered my newly minted credentials… Only to be rerouted back to the log in page. What!?! It still doesn’t work! What do I have to do! After trying to log in using three different browsers and two different computers without any joy I concluded the problem was not at my end. I clicked on the “contact us for help” button ready to send a filth laden email. However, right at the top of the help was information under the heading, “I am not able to login to the dbGaP system, please help!”. I guess it’s easier to document a problem than fix it.

It seems that the internet elves need a day or so for any changes in the eRA accounts to be transmitted to dbGaP. As I had only just made the account on the eRA, dbGaP hadn’t found out about it yet. So my log in didn’t work. Perhaps the laws of physics are different between the servers that host dbGaP and the eRA, causing data transmission to suffer from a day long delay. Maybe the eRA hosts their user data on the New Horizons space craft, currently out past Pluto, 6 light hours from earth, making synchronisation unavoidably slow. Whatever the explanation, the problem had resolved itself a day later. I could now apply for access.

Seeing as you’ve been kind enough to read this far, I’ll spare you the gruesome details of the application process once you are logged into the dbGaP system. It’s a lot more involved than fill in email and personal details though. But it essentially boils down to needing to write a research proposal. That is, you have to describe how you’re going to use the data in both technical/scientific terms and language understandable by a general audience. So I wrote a proposal. It took me a while (about an hour), but relative to the rest of the process I suppose it didn't take that much time. After one last proof read, I hit submit.

In theory, this is the end of the process. It has taken me three weeks of waiting and many hours of effort to get to this point. From here I have to wait some more, while a committee reviews my application and decides whether to give me access. But there should be no more forms to fill or weird numbers to apply for. I’m sure you’ll agree the whole process was incredibly frustrating, especially given that the purpose of dbGaP is to enable potentially life saving research. But I have to admit that I felt a sense of accomplishment and satisfaction at having navigated all the bureaucratic obstructions and was looking forward to soon receiving access to the data.

Two days later I received this email…

Dear Matthew Young,

This email was generated by the National Center for Biotechnology Information Genotypes and Phenotypes Database (NCBI dbGaP) Data Access Request system at the National Institutes of Health.

Your access to data in dbGaP Authorized Access System is suspended

For: pending small company review

If you have any questions regarding Controlled Access Portal please contact NCBI dbGaP help desk at Please do not reply to this message.


Read more posts by Matthew Young