Gap statistics for whole genome shotgun DNA sequencing projects

被引:7
作者
Wendl, MC [1 ]
Yang, SP [1 ]
机构
[1] Washington Univ, Sch Med, Genome Sequencing Ctr, St Louis, MO 63108 USA
关键词
D O I
10.1093/bioinformatics/bth120
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Investigators utilize gap estimates for DNA sequencing projects. Standard theories assume sequences are independently and identically distributed, leading to appreciable under-prediction of gaps. Results: Using a statistical scaling factor and data from 20 representative whole genome shotgun projects, we construct regression equations that relate coverage to a normalized gap measure. Prokaryotic genomes do not correlate to sequence coverage, while eukaryotes show strong correlation if the chaff is ignored. Gaps decrease at an exponential rate of only about one-third of that predicted via theory alone. Case studies suggest that departure from theory can largely be attributed to assembly difficulties for repeat-rich genomes, but bias and coverage anomalies are also important when repeats are sparse. Such factors cannot be readily characterized a priori, suggesting upper limits on the accuracy of gap prediction. We also find that diminishing coverage probability discussed in other studies is a theoretical artifact that does not arise for the typical project.
引用
收藏
页码:1527 / 1534
页数:8
相关论文
共 44 条
[1]   The independence of our genome assemblies [J].
Adams, MD ;
Sutton, GG ;
Smith, HO ;
Myers, EW ;
Venter, JC .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3025-3026
[2]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[3]   SHOTGUN DNA SEQUENCING USING CLONED DNASE I-GENERATED FRAGMENTS [J].
ANDERSON, S .
NUCLEIC ACIDS RESEARCH, 1981, 9 (13) :3015-3027
[4]   Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[5]   A complete sequence of the T tengcongensis genome [J].
Bao, QY ;
Tian, YQ ;
Li, W ;
Xu, ZY ;
Xuan, ZY ;
Hu, SN ;
Dong, W ;
Yang, J ;
Chen, YJ ;
Xue, YF ;
Xu, Y ;
Lai, XQ ;
Huang, L ;
Dong, XZ ;
Ma, YH ;
Ling, LJ ;
Tan, HR ;
Chen, RS ;
Wang, J ;
Yu, J ;
Yang, HM .
GENOME RESEARCH, 2002, 12 (05) :689-700
[6]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[7]   Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii [J].
Carlton, JM ;
Angiuoli, SV ;
Suh, BB ;
Kooij, TW ;
Pertea, M ;
Silva, JC ;
Ermolaeva, MD ;
Allen, JE ;
Selengut, JD ;
Koo, HL ;
Peterson, JD ;
Pop, M ;
Kosack, DS ;
Shumway, MF ;
Bidwell, SL ;
Shallom, SJ ;
van Aken, SE ;
Riedmuller, SB ;
Feldblyum, TV ;
Cho, JK ;
Quackenbush, J ;
Sedegah, M ;
Shoaibi, A ;
Cummings, LM ;
Florens, L ;
Yates, JR ;
Raine, JD ;
Sinden, RE ;
Harris, MA ;
Cunningham, DA ;
Preiser, PR ;
Bergman, LW ;
Vaidya, AB ;
Van Lin, LH ;
Janse, CJ ;
Waters, AP ;
Smith, HO ;
White, OR ;
Salzberg, SL ;
Venter, JC ;
Fraser, CM ;
Hoffman, SL ;
Gardner, MJ ;
Carucci, DJ .
NATURE, 2002, 419 (6906) :512-519
[8]   The draft genome of Ciona intestinalis:: Insights into chordate and vertebrate origins [J].
Dehal, P ;
Satou, Y ;
Campbell, RK ;
Chapman, J ;
Degnan, B ;
De Tomaso, A ;
Davidson, B ;
Di Gregorio, A ;
Gelpke, M ;
Goodstein, DM ;
Harafuji, N ;
Hastings, KEM ;
Ho, I ;
Hotta, K ;
Huang, W ;
Kawashima, T ;
Lemaire, P ;
Martinez, D ;
Meinertzhagen, IA ;
Necula, S ;
Nonaka, M ;
Putnam, N ;
Rash, S ;
Saiga, H ;
Satake, M ;
Terry, A ;
Yamada, L ;
Wang, HG ;
Awazu, S ;
Azumi, K ;
Boore, J ;
Branno, M ;
Chin-bow, S ;
DeSantis, R ;
Doyle, S ;
Francino, P ;
Keys, DN ;
Haga, S ;
Hayashi, H ;
Hino, K ;
Imai, KS ;
Inaba, K ;
Kano, S ;
Kobayashi, K ;
Kobayashi, M ;
Lee, BI ;
Makabe, KW ;
Manohar, C ;
Matassi, G ;
Medina, M .
SCIENCE, 2002, 298 (5601) :2157-2167
[9]   RANDOM SUBCLONING OF SONICATED DNA - APPLICATION TO SHOTGUN DNA-SEQUENCE ANALYSIS [J].
DEININGER, PL .
ANALYTICAL BIOCHEMISTRY, 1983, 129 (01) :216-223
[10]   The genome sequence of the facultative intracellular pathogen Brucella melitensis [J].
DelVecchio, VG ;
Kapatral, V ;
Redkar, RJ ;
Patra, G ;
Mujer, C ;
Los, T ;
Ivanova, N ;
Anderson, I ;
Bhattacharyya, A ;
Lykidis, A ;
Reznik, G ;
Jablonski, L ;
Larsen, N ;
D'Souza, M ;
Bernal, A ;
Mazur, M ;
Goltsman, E ;
Selkov, E ;
Elzer, PH ;
Hagius, S ;
O'Callaghan, D ;
Letesson, JJ ;
Haselkorn, R ;
Kyrpides, N ;
Overbeek, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (01) :443-448