Bram De Wilde, Tom Sante, Jasper Anckaert, Jan Hellemans, Frank Speleman, Björn Menten, Jo Vandesompele
Center for Medical Genetics, Ghent University
Background – As genetic variation data is being generated at an unprecedented scale, assessment of functional consequences of the variants in a given patient or patient cohort is a challenging task, both from a computational as from a data management perspective. It is expected that in this new era of personalised genomics, a clinical sample may need to be re-annotated repeatedly as new annotation information on the genome becomes available and new insights on variant interpretation accumulate. More so, the advent of individualised, genomics directed therapeutical strategies will require patients tumours to be genetically profiled at multiple levels within the diagnosis to treatment timeframe, thus requiring the fast and qualitative analysis of huge datasets. While various initiatives emerge to collect the overwhelming amount of genomic variants currently generated, a central system to manage and store the annotation of genomic variants on a sample by sample basis is still missing.
Results – Here we present our efforts to create a one stop solution to next generation sequencing data analysis. The ‘NeXT-generation Variant Annotation Tracker’ is a front-end to a highly scalable cloud based analysis platform. Combining this web based front-end with an object oriented shardable database and a fully distributed analysis pipeline allows us to scale this application to virtually any size required. A “plug in” style organisation of the variant annotation pipelines makes updating and extending variant annotation easy. The cloud based nature of this platform addresses both the scalability and data management issues encountered when working with huge next generation sequencing datasets.
Visualisation of variant annotation at the individual samples but also on the population level can easily be achieved through a map reduce framework allowing us to grasp the genomic variation at both the gene or biological pathway level.
Currently variant annotation and effect prediction is done using the Ensembl API (1), the polyphen2 algorithm (2) and genesplicer (3). All tools are fully compatible with the emerging standard formats in next generation sequencing data analysis, including the VCF version 4.1 from the 1000 genomes consortium (4,5).
1. Mclaren W et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 2010 26(16): 2069–70.
2. Adzhubei IA et al. A method and server for predicting damaging missense mutations. Nat Meth. 2010 Apr 1;7(4):248–9.
3. Pertea M et al. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001 Mar 1;29(5):1185–90.
4. Danecek P et al. The variant call format and VCFtools. Bioinformatics. 2011 Jul 15;27(15):2156–8.
5. Durbin RM et al. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061–73.
|Back to Next Generation Sequencing (1)|