Decoding of Superimposed Traces Produced by Direct Sequencing of Heterozygous Indels

被引:105
作者
Dmitriev, Dmitry A. [1 ]
Rakitov, Roman A. [1 ]
机构
[1] Illinois Nat Hist Survey, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
D O I
10.1371/journal.pcbi.1000113
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Direct Sanger sequencing of a diploid template containing a heterozygous insertion or deletion results in a difficult-to-interpret mixed trace formed by two allelic traces superimposed onto each other. Existing computational methods for deconvolution of such traces require knowledge of a reference sequence or the availability of both direct and reverse mixed sequences of the same template. We describe a simple yet accurate method, which uses dynamic programming optimization to predict superimposed allelic sequences solely from a string of letters representing peaks within an individual mixed trace. We used the method to decode 104 human traces (mean length 294 bp) containing heterozygous indels 5 to 30 bp with a mean of 99.1% bases per allelic sequence reconstructed correctly and unambiguously. Simulations with artificial sequences have demonstrated that the method yields accurate reconstructions when (1) the allelic sequences forming the mixed trace are sufficiently similar, (2) the analyzed fragment is significantly longer than the indel, and (3) multiple indels, if present, are well-spaced. Because these conditions occur in most encountered DNA sequences, the method is widely applicable. It is available as a free Web application Indelligent at http://ctap.inhs.uiuc.edu/dmitriev/indel.asp.
引用
收藏
页数:10
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes [J].
Bhangale, TR ;
Rieder, MJ ;
Livingston, RJ ;
Nickerson, DA .
HUMAN MOLECULAR GENETICS, 2005, 14 (01) :59-69
[3]   Automating resequencing-based detection of insertion-deletion polymorphisms [J].
Bhangale, Tushar R. ;
Stephens, Matthew ;
Nickerson, Deborah A. .
NATURE GENETICS, 2006, 38 (12) :1457-1462
[4]   Insertion-deletion polymorphisms in 3′ regions of maize genes occur frequently and can be used as highly informative genetic markers [J].
Bhattramakki, D ;
Dolan, M ;
Hanafey, M ;
Wineland, R ;
Vaske, D ;
Register, JC ;
Tingey, SV ;
Rafalski, A .
PLANT MOLECULAR BIOLOGY, 2002, 48 (05) :539-547
[5]   Nucleotide diversity and linkage disequilibrium in loblolly pine [J].
Brown, GR ;
Gill, GP ;
Kuntz, RJ ;
Langley, CH ;
Neale, DB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (42) :15255-15260
[6]   PolyScan: An automatic indel and SNP detection approach to the analysis of human resequencing data [J].
Chen, Ken ;
McLellan, Michael D. ;
Ding, Li ;
Wendl, Michael C. ;
Kasai, Yumi ;
Wilson, Richard K. ;
Mardis, Elaine R. .
GENOME RESEARCH, 2007, 17 (05) :659-666
[7]   Characterization of evolutionary rates and constraints in three mammalian genomes [J].
Cooper, GM ;
Brudno, M ;
Stone, EA ;
Dubchak, I ;
Batzoglou, S ;
Sidow, A .
GENOME RESEARCH, 2004, 14 (04) :539-548
[8]  
CREER S, 2007, BIOINFORMATICS, V3, P99
[9]   AutoCSA, an algorithm for high throughput DNA sequence variant detection in cancer genomes [J].
Dicks, E. ;
Teague, J. W. ;
Stephens, P. ;
Raine, K. ;
Yates, A. ;
Mattocks, C. ;
Tarpey, P. ;
Butler, A. ;
Menzies, A. ;
Richardson, D. ;
Jenkinson, A. ;
Davies, H. ;
Edkins, S. ;
Forbes, S. ;
Gray, K. ;
Greenman, C. ;
Shepherd, R. ;
Stratton, M. R. ;
Futreal, P. A. ;
Wooster, R. .
BIOINFORMATICS, 2007, 23 (13) :1689-1691
[10]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185