CMBS (Computational & Molecular Biology Symposium) 2016, held at the Conway Institute (University College Dublin), packed ten short talks (mostly by students and ranging from viruses to racehorses), seven keynotes and three presentations from sponsors (Faculty of 1000, NSilico and Amazon Web Services) into two days, while still leaving time for poster sessions and networking. Although some of the talks were, strictly speaking, outside of Repositive's human genomic and microbiomic remit, our attention was naturally caught by Chloe Vigliotti's analysis of the response of lizard gut microbiomes to rapid diet change. Liz (our mascot) will be carefully considering the implications. :-)
Winding career paths and being in the right place at the right time were recurrent themes for the keynote speakers. Chris Ponting, in addition, opened with the cautionary note that Big Data by itself does not guarantee Big Knowledge. He described the identification by his group of distant homology orthologs with low sequence alignment but structure and function similarities using 'little data' (though obviously the right data for the task!) and the filtering down of tens of thousands of vitamin D receptor binding sites (from ChIP-exo) to a subset then found to be enriched for trait and disease (notably multiple sclerosis) associations. For the increasingly Big Data of Pfam and Rfam, Alex Bateman described the advantages and quirks of crowdsourcing annotation from Wikipedians. Martin Krzywinski demonstrated the approaches and iterations required to capture core truths from complex and noisy datasets in compelling visualisations (beware of maize-triceratops hybrids!) and Aoife McLysaght illustrated how new genes can arise. Whereas Elaine Holmes and Francesca Buffa had a primarily medical focus (discussing metabolomics and the tumour microenvironment, respectively), Mick Watson made a plea for students in the audience to consider agricultural omics - lest the human species succeed in curing all diseases only to die of starvation.
Cautionary notes were not limited to the keynote speakers. Cian Murphy showed how 'high significance' variants could arise solely from systematic differences in which sequencing technologies were used and displayed PCA plots with striking clustering by sequencing site. Alexander Douglass demonstrated that Pichia kudriavzevli, a yeast widely used in biotech including the production of chocolate, is actually the same species as Candida krusei, a major pathogen in immunocompromised patients. More positively, Guillaume Devailly presented Heat*seq a web application that aligns with the core Repositive value of genomic dataset reuse and Luis Iglesias-Martinez explained the algorithm he is using to infer regulatory networks from gene expression data and its successful application to the DREAM4 Challenge.
Among the posters relevant to human genomic data, one that particularly caught my attention was 'The Irish DNA Atlas - A study of genetic diversity in Ireland'. Edmund Gilbert explained to me how the population of Ireland can be regarded (at least from the perspective of human genomic disease association studies!) as a scaled-up version of that of Iceland, with consequent increase in power to detection such associations. To realise these benefits, however, the otherwise potentially confounding population structure first needs to be analysed - hence the atlas.
Congratulations to the student organisers for the thoughtful planning and smooth running of the entire event and thanks to all who dropped by the Repositive stall for discussions and demos or just to enter our Christmas jumper draw.