首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A framework for variation discovery and genotyping using next-generation DNA sequencing data
Authors:DePristo Mark A  Banks Eric  Poplin Ryan  Garimella Kiran V  Maguire Jared R  Hartl Christopher  Philippakis Anthony A  del Angel Guillermo  Rivas Manuel A  Hanna Matt  McKenna Aaron  Fennell Tim J  Kernytsky Andrew M  Sivachenko Andrey Y  Cibulskis Kristian  Gabriel Stacey B  Altshuler David  Daly Mark J
Institution:Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. depristo@broadinstitute.org
Abstract:Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号