Us, you & 23andMe: Personal Genomes just got 'personal'



23andMe is a hot topic in the Repositive office. Not only because of our Scientific Lead Manuel Corpas crowd sourcing the analysis of his and his families genomes, but also due to our mission of helping researchers to help patients, faster. Our aim is to help researchers with the discoverability and accessibility of data to power research, but personal genomes is a key crossover where in fact, our interactions can be direct with the sequenced individuals themselves. So, it is something we are very passionate about and it's time to get personal, on personal genomics.

Watch our vlog on the 23andMe collection




Manuel Corpas - Scientific Lead

"It is quite intriguing to be able to genetically place the genomes of my family among all those that have made their 23andMe genotype data freely available. Although no surprises were found, I still get a sense of community when we analyse the genomes of those of us who advocate for free access and sharing of personal genome data."

Read more about Manny here: Repositive hires pioneer of personal genomics Manuel Corpas

Or visit his blog: Personal Genomics Zone


Richard Shaw - Bioinformatics Scientist

"Creating a Repositive Collection corresponding to the 23andMe study set (collected 10 months previously) revealed something of the dynamics of such a dataset. One challenge encountered was finding that a significant number of users at a particular data source had new account numbers, such that mapping was required. Another issue that could not be resolved was that, on another source, three user accounts no longer existed and two others still existed but no longer had any genotype data. On a more positive note, the number of hits for `23andMe' - the original search term from which results were curated to produce the study set of 23andMe genotypes - has grown by 11% over the same time period. If the curation attrition rate stayed the same, this would result in another 250 unique build37 genotypes to add to the study."


"Matching up study set genotypes to datasets currently indexed on the Repositive platform was also informative about the multiple data items that some people share from one account. At the time that the study set was collected, we indexed all data items; subsequently we decided to index only the newest genotype. Within the study, we had already identified (by looking at principal components generated from a linkage-pruned SNP set) individuals represented by multiple genotypes in the original dataset of 2402 and filtered out duplicates to produce the final set of 2280. The matching of these unique genotypes to accounts revealed (possibly not surprisingly) that in each of 12 cases the genotypes of two different individuals were stored under the same account. Confirmation that this was not just some artefact of a difference threshold came from calling the genders of these individuals - 7 of the 12 pairs comprised a male and a female (3 of the other pairs were both male, leaving 2 both female). Reviewing the account pages from which the genotypes were linked, the links were not annotated as corresponding to different individuals so it was only through downloading and analysing the genotypes that we were able to discover this."


"Although I have focused above on the challenges and quirks of such a collection (these may be of particular relevance to anyone trying to curate similar collections), 2226 (over 97%) of the original study set of genotypes were matched on the current Repositive platform and a further 35 `replacement' genotypes (same individual - different genotype file) were included. I hope that this collection will prove useful to our users - whether as a reference dataset (taking into account the self-selection of the individuals involved) or for other purposes."



Left to right: Richard Shaw, Charlotte Whicher, Craig Smith


Charlotte Whicher - Product Manager

"I know a lot of people who have had their genomes sequenced: within our office Manny and Amanda have, my best friend's brother has, my cousin has, and even my Grandmother has!! And yet, when I am asked if I would get my genome sequenced, I answer no - I'm still not convinced. It's a cost benefit thing for me, and (many people may disagree with me) I don't think we, as a society, understand all of the costs or all of the benefits yet; particularly when it comes to sharing this data. This is highlighted by the recent negative response to Dame Sally Davies' recent proposal to make DNA sequencing for cancer patients routine on NHS - people are still scared and there are still many many questions around data protection."


"Nevertheless, I concede that without people going out and getting their genomes sequenced, and sharing the data for that matter, we will never get any closer to understanding those costs and benefits. So all credit given to those individuals."


"What I've found is that the people who get their genomes sequenced generally fall into three broad categories: The Educated, The Uneducated & The Pioneers. Members of these groups have very different reasons for why they 'got sequenced', opinions on the risks & benefits and understandings of what 'sharing your data' actually means."


"Quote taken from Charlotte's blog post Personal Genome Data: to share or not to share"


Craig Smith - Marketing Manager

"If we all sit and think about the people in our lives, just like me you will begin to realise that many people have some kind of genetic disorder or genetic predisposition. My husband has Tourette's Syndrome. My best friend had breast cancer. My nephew has Asperger's. My brother had testicular cancer. My father has an irregular heart beat. I have IBS. However, speaking to all of these people it dawned on me that NONE of them really understands genetic tests and their value to research."


"The lack of information, or perhaps more so, the coverage on the value of personal genome tests is actually incredible. The real driving factor to get yourself a genetic test seems to be pure interest. Many people just want to know their ancestry and peer back into the ages and plot their extended family tree. I mean, most people I know recall when celebrity Danny Dyer was told on a TV show he was related to King Edward II. 23andMe, to me, seems to have really gained worldwide recognition as a business. Many, many people have heard of 23andMe, but none of the people I have spoken too have been sequenced or genotyped. It is expensive, and that's one of the reasons I have not ordered my own 23andMe kit."


"However, it surprises me that with all the money and energy put into studying breast cancer, Tourettes and Autism, none of the people around me, have been sequenced / genotyped as part of the diagnosis or treatment of their disease or disorder. This is just the 5 people I know and have asked. I wonder how many others out there could contribute to research if they 1) knew how to, 2) were encouraged to and 3) financial support to do so."


"Seeing this collection fills me with great joy. I expect many of these people have paid for their 23andMe tests themselves and have put it online for the research community to study. I hope this sets an example to everyone on the value of this data and encourages more data to be shared. No doubt, more and more of us will pay to get genotyped or sequenced, if for no other reason than the cost involved has reduced and matches our own personal interest to do so."


Unlocking the potential of DTC genomic tests

Watch our second vlog on the 23andMe collection



Image Alt Text




Suggested Posts

New Research Shows How Repositive Platform Can Be Used to Leverage DTC Genomic Data for research

Personal Genome Data: to share or not to share

Personal Genomics Open Access Datasets Even More European-Biased Than Scientific Literature?

The Value of Personal Data

Repositive to Launch Personal Genome Project Data Collection

Read more posts by Craig Smith