You want your hypothesis to be true, but what does the data tell you?
If you are a researcher in genetics or other biomedical sciences you are probably hoping for your big breakthrough to one day be able to say that you found the genetic cause or cure for a disease.
Such as these guys.
The BBC coverage of the Neuron paper: http://www.cell.com/neuron/fulltext/S0896-6273(16)30126-X
And you want to be able to celebrate the victory when all the big news outlets are talking about your breakthrough findings.
And what you would rather not want to happen is for your results to be questioned by several high-standing researchers in the community.
Daniel MacArthur from the Broad Institute, Boston, MA started this thread on Twitter
This breakthrough paper has a problem
So what is the big deal? What was the result? And why was it unconvincing?
The discussion section of the paper: http://www.cell.com/neuron/fulltext/S0896-6273(16)30126-X
The paper showed how 7 cases of Multiple Sclerosis in two families were all carrier of the same genetic variant, a mutation in NR1H3. The paper made the case for this variant to be the causal dominant variant for the disease in these cases.
Firstly, it is not at all unusual to find that the same genetic variant appears across many related individuals in a family tree. What would be unusual would be for this gene to be present only in individuals with the disease (MS).
However, the paper also noted that in the same two families there was a total of four healthy carriers. Suggesting that if this variant is indeed causal, it has incomplete penetrance, which detracts from the evidence of about the gene being causal if viewed without further supporting evidence.
And then, the even harder blow on the evidence is the fact that the same genetic variant was found in no less than 21 healthy carriers in the ExAC database! The evidence for a dominant variant now looks very weak…
The whole case as explained by @dgmacarthur: http://www.ncbi.nlm.nih.gov/pubmed/27253448#cm27253448_16159
What went wrong?
To allow for the benefit of the doubt: I cannot imagine that the authors of this paper would have continued to support their hypothesis and start planning expensive follow-up experiments if they had been aware of this data as the first thing when they started to analyse their results. What I guess happened is that by the time they thought of looking up their variant in the public ExAC database, they were already so in love with their hypothesis and potential breakthrough discovery that they simply ignored the facts that were on the table.
Unfortunately, that is not good science.
How you should plan for your next breakthrough
Here are my few small tips that I would advice you to consider in your preparation of your next big breakthrough.
ALWAYS — Look for publicly available reference data from healthy individuals to test (and hopefully validate) your hypothesis. For human genetics, you have two great resources in the ExAC database and the Reference Variant Store. Both open access, online and easy to use. There are no excuses.
Look for data sources and collaborators with data on the same disease as you are investigating, and cross-check that your results replicate also in their datasets. The more specific your finding, the more specific the cross-check. If you are looking for specific genetic variants linked to a disease you can use e.g. ClinVar, Cafe Variome and SNPedia. If you are looking for more complex signals in the data, or characteristics that are not well annotated yet, such as genome rearrangements or haplotypes go find the raw data files from various repositories and data sources by searching on Repositive.
Look at your available data with an unbiased mind — good practice is to also ask a colleague for unbiased review of your conclusion. What does the data tell you?
Ignoring the data is not an option.
I presented this case study at the #QuantGen2016 conference and uploaded my slides to SlideShare: Your next scientific breakthrough