Diversity & comparability: 50 genomic data sources in 1 place



As of July 2017, we are delighted to announce we have hit more than 50 data sources on our platform (51 to be precise!).

The rising number of data sources gives our platform a diverse and solid infrastructure that further enables researchers and scientists to search, access and collaborate around datasets which are integral to their research. It also makes it easier for researchers, making it possible to find data all in one place, rather than having to actively seek out data sources and search through them separately. Many researchers will likely come across something they may not usually have before!

As our very own Data Scientist, Steve Williams, comments:

“We’ve added 10 new data sources in the last 3 months alone. The more and more data we can bring into the platform, the easier it will be to find data, and access the data, and that’s going to have an exponential effect on research. So I think it’s really exciting – obviously there’s more data being released every day, so we can continue to strive to find that data and make it easier for other people.”


Craig vlog's on reaching this epic milestone



As the world's largest human genomic data portal, we are also providing the researcher with a lense that gives them a wider scope over the broad genomic landscape. Repositive's Senior Bioinformatics Scientist, Richard Shaw, sums it up perfectly:

"Working on indexing human genomic data sources has shown me what a broad range of repositories are out there. It is good to see so many groups taking the time and effort to make data reusable. Some sites can be more challenging to index than others. It can be satisfying to reverse engineer the interface underlying a graphical display and extract the metadata we need to index the source but a well-documented API makes the process easier and I appreciate the effort that goes into that as well. Each source that we add to the Repositive platform hopefully means that our users have a higher probability of finding more of the data they need for their research."

Niche and specialised data sources



The struggle for finding and accessing data for researchers is real - with most researchers only knowing of a small handful of resources, when in fact there are hundreds! We're making sure we're capturing everything that may be relevant to human genomics researchers in a variety of fields. Not just the big data sources - but the small ones too!

As our Data Scientist Steve Williams comments:

“Particularly for less well-studied diseases - data often gets deposited in small repositories, sometimes hosted by universities but also by small lab groups. So these are not as obvious or known repositories that researchers will use. I was working in cancer research before coming to Repositive. Cancer’s quite a big field so finding data is generally quite easy in the sense of proportion – there’s lots of data available. But prior to this I was working in a more niche and specialised area. And it was much harder to find that data. So what we’re trying to do is identify some of those sources and make those visible.”

As researchers move towards different hypothesis and need new data, they have a set of challenges such as logging, comparing, monitoring and networking connectivity. Our platform overcomes this barrier by allowing users to favourite and follow datasets within data sources, which means they can be updated later of any changes made.

Building a FAIR community



Expanding the range of data sources on our platform provides further collaborative opportunity to our community. With more than 1,850 users and counting, more data from more sources benefits all of our users including those in pharma, biotech, and academia, and the wider genomics R&D industry. We've created new features which allow researchers to also connect with others and collaborate around data. Users can therefore connect with other researchers with the same interests, and to ask the community for help by making data requests.

The community-driven aspect of Repositive also means that researchers are able to support the principles of data sharing, and are able to help one another in their research. As our Data Scientist, Steve Williams points out:

"At Repositive, we support the principles of FAIR – a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable. We want data shared, findable, accessible, and for scientists to be able to re-use it. All these things are central to what we are trying to do. But it’s quite strict in what the requirements are, and honestly sometimes we are limited by the data we have. For example, on our platform there is some data that is restricted, but we still want people to be able to find it. But FAIR is what we aspire to be, and it’s the idea behind it that’s important. We also want to encourage those sorts of principles to our users, and for researchers to do everything they can to make data FAIR. It’s helping to drive more and better research because people can re-use that data."

We're moving fast, but taking time


Another challenge of ours is making sure the metadata we have on our platform is as informative as possible for our users. As many researchers know too well, there is no single format for the organisation of genomics metadata.

So whilst our list of data sources is growing fast, we are also mindful that we have to make the most of what we already have! A strong focus in our team at the moment is on making sure that the metadata we already have on our platform is of a quality that our users can work with.
This means doing a lot of heavy-lifting and re-mapping the metadata in a way that will work consistently across data sources. We hope that by inputting a common standard that will work across all of our datasets, this will not only benefit our users tremendously - but that in time this standard will be more common across human genomic research more generally.

We are not stopping at this milestone. It is a given that the list of data sources on our platform will continue to grow; these will also expand to new areas of research, thus further benefiting researchers with different areas of interest.

So here's to the next 50!

Sign-up free & search our 51 data sources: https://discover.repositive.io/datasources




What does indexing 50 data sources mean to our users?

Craig explains in our first vlog



Image Alt Text


Read more posts by Daniel Jason Binks