Identification of copy number variants in whole-genome data using Reference Coverage Profiles

被引:11
作者
Glusman, Gustavo [1 ]
Severson, Alissa [1 ]
Dhankani, Varsha [1 ]
Robinson, Max [1 ]
Farrah, Terry [1 ]
Mauldin, Denise E. [1 ]
Stittrich, Anna B. [1 ]
Ament, Seth A. [1 ]
Roach, Jared C. [1 ]
Brunkow, Mary E. [1 ]
Bodian, Dale L. [2 ]
Vockley, Joseph G. [2 ]
Shmulevich, Ilya [1 ]
Niederhuber, John E. [2 ]
Hood, Leroy [1 ]
机构
[1] Inst Syst Biol, Seattle, WA 98109 USA
[2] Inova Hlth Syst, Inova Translat Med Inst, Falls Church, VA USA
关键词
whole-genome sequencing; structural variation; depth of coverage; signal processing; clinical genomics; STRUCTURAL VARIANT; SEQUENCING DATA; EXACT BREAKPOINTS; READ-DEPTH; DISCOVERY; FRAMEWORK; DELETIONS; CANCER; MODELS;
D O I
10.3389/fgene.2015.00045
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000x compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40x) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
引用
收藏
页数:13
相关论文
共 61 条
[1]   CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing [J].
Abyzov, Alexej ;
Urban, Alexander E. ;
Snyder, Michael ;
Gerstein, Mark .
GENOME RESEARCH, 2011, 21 (06) :974-984
[2]   cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data [J].
Bellos, Evangelos ;
Johnson, Michael R. ;
Coin, Lachlan J. M. .
GENOME BIOLOGY, 2012, 13 (12) :R120
[3]   Germline Variation in Cancer-Susceptibility Genes in a Healthy, Ancestrally Diverse Cohort: Implications for Individual Genome Sequencing [J].
Bodian, Dale L. ;
McCutcheon, Justine N. ;
Kothiyal, Prachi ;
Huddleston, Kathi C. ;
Iyer, Ramaswamy K. ;
Vockley, Joseph G. ;
Niederhuber, John E. .
PLOS ONE, 2014, 9 (04)
[4]   TIGRA: A targeted iterative graph routing assembler for breakpoint assembly [J].
Chen, Ken ;
Chen, Lei ;
Fan, Xian ;
Wallis, John ;
Ding, Li ;
Weinstock, George .
GENOME RESEARCH, 2014, 24 (02) :310-317
[5]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/nmeth.1363, 10.1038/NMETH.1363]
[6]  
Chiang DY, 2009, NAT METHODS, V6, P99, DOI [10.1038/nmeth.1276, 10.1038/NMETH.1276]
[7]   SVM2: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data [J].
Chiara, Matteo ;
Pesole, Graziano ;
Horner, David S. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (18) :e145
[8]   Unsupervised segmentation of continuous genomic data [J].
Day, Nathan ;
Hemmaplardh, Andrew ;
Thurman, Robert E. ;
Stamatoyannopoulos, John A. ;
Noble, William S. .
BIOINFORMATICS, 2007, 23 (11) :1424-1426
[9]   Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS [J].
Emde, Anne-Katrin ;
Schulz, Marcel H. ;
Weese, David ;
Sun, Ruping ;
Vingron, Martin ;
Kalscheuer, Vera M. ;
Haas, Stefan A. ;
Reinert, Knut .
BIOINFORMATICS, 2012, 28 (05) :619-627
[10]   PeSV-Fisher: Identification of Somatic and Non-Somatic Structural Variants Using Next Generation Sequencing Data [J].
Escaramis, Georgia ;
Tornador, Cristian ;
Bassaganyas, Laia ;
Rabionet, Raquel ;
Tubio, Jose M. C. ;
Martinez-Fundichely, Alexander ;
Caceres, Mario ;
Gut, Marta ;
Ossowski, Stephan ;
Estivill, Xavier .
PLOS ONE, 2013, 8 (05)