The variant call format and VCFtools

被引:14928
作者
Danecek, Petr [1 ]
Auton, Adam [2 ]
Abecasis, Goncalo [3 ]
Albers, Cornelis A. [1 ]
Banks, Eric [4 ]
DePristo, Mark A. [4 ]
Handsaker, Robert E. [4 ]
Lunter, Gerton [2 ]
Marth, Gabor T. [5 ]
Sherry, Stephen T. [6 ]
McVean, Gilean [2 ,7 ]
Durbin, Richard [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[3] Univ Michigan, Dept Biostat, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[4] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02141 USA
[5] Boston Coll, Dept Biol, Boston, MA 02467 USA
[6] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[7] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
基金
英国医学研究理事会; 美国国家卫生研究院; 英国惠康基金;
关键词
D O I
10.1093/bioinformatics/btr330
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
引用
收藏
页码:2156 / 2158
页数:3
相关论文
共 4 条
  • [1] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [2] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
  • [3] The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
    McKenna, Aaron
    Hanna, Matthew
    Banks, Eric
    Sivachenko, Andrey
    Cibulskis, Kristian
    Kernytsky, Andrew
    Garimella, Kiran
    Altshuler, David
    Gabriel, Stacey
    Daly, Mark
    DePristo, Mark A.
    [J]. GENOME RESEARCH, 2010, 20 (09) : 1297 - 1303
  • [4] A standard variation file format for human genome sequences
    Reese, Martin G.
    Moore, Barry
    Batchelor, Colin
    Salas, Fidel
    Cunningham, Fiona
    Marth, Gabor T.
    Stein, Lincoln
    Flicek, Paul
    Yandell, Mark
    Eilbeck, Karen
    [J]. GENOME BIOLOGY, 2010, 11 (08): : R88