Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

被引:79
作者
Sundquist, Andreas [1 ]
Ronaghi, Mostafa [2 ]
Tang, Haixu [3 ]
Pevzner, Pavel [4 ]
Batzoglou, Serafim [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Genome Technol Ctr, Stanford, CA USA
[3] Indiana Univ, Sch Informat, Bloomington, IN USA
[4] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
来源
PLOS ONE | 2007年 / 2卷 / 05期
关键词
D O I
10.1371/journal.pone.0000484
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.
引用
收藏
页数:14
相关论文
共 35 条
  • [1] [Anonymous], 2001, RECOMB
  • [2] Sequencing a genome by walking with clone-end sequences: A mathematical analysis
    Batzoglou, S
    Berger, B
    Mesirov, J
    Lander, ES
    [J]. GENOME RESEARCH, 1999, 9 (12) : 1163 - 1174
  • [3] Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
  • [4] Genome sequence of the nematode C-elegans:: A platform for investigating biology
    不详
    [J]. SCIENCE, 1998, 282 (5396) : 2012 - 2018
  • [5] Fragment assembly with short reads
    Chaisson, M
    Pevzner, P
    Tang, HX
    [J]. BIOINFORMATICS, 2004, 20 (13) : 2067 - 2074
  • [6] CHEE MS, 1990, CURR TOP MICROBIOL, V154, P125
  • [7] AUTOMATED DNA SEQUENCING OF THE HUMAN HPRT LOCUS
    EDWARDS, A
    VOSS, H
    RICE, P
    CIVITELLO, A
    STEGEMANN, J
    SCHWAGER, C
    ZIMMERMANN, J
    ERFLE, H
    CASKEY, CT
    ANSORGE, W
    [J]. GENOMICS, 1990, 6 (04) : 593 - 608
  • [8] ELTOUKHY H, 2006, 2006 IEEE INT C AC S, P1032
  • [9] COMPLETE NUCLEOTIDE-SEQUENCE OF SV40 DNA
    FIERS, W
    CONTRERAS, R
    HAEGEMAN, G
    ROGIERS, R
    VANDEVOORDE, A
    VANHEUVERSWYN, H
    VANHERREWEGHE, J
    VOLCKAERT, G
    YSEBAERT, M
    [J]. NATURE, 1978, 273 (5658) : 113 - 120
  • [10] WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD
    FLEISCHMANN, RD
    ADAMS, MD
    WHITE, O
    CLAYTON, RA
    KIRKNESS, EF
    KERLAVAGE, AR
    BULT, CJ
    TOMB, JF
    DOUGHERTY, BA
    MERRICK, JM
    MCKENNEY, K
    SUTTON, G
    FITZHUGH, W
    FIELDS, C
    GOCAYNE, JD
    SCOTT, J
    SHIRLEY, R
    LIU, LI
    GLODEK, A
    KELLEY, JM
    WEIDMAN, JF
    PHILLIPS, CA
    SPRIGGS, T
    HEDBLOM, E
    COTTON, MD
    UTTERBACK, TR
    HANNA, MC
    NGUYEN, DT
    SAUDEK, DM
    BRANDON, RC
    FINE, LD
    FRITCHMAN, JL
    FUHRMANN, JL
    GEOGHAGEN, NSM
    GNEHM, CL
    MCDONALD, LA
    SMALL, KV
    FRASER, CM
    SMITH, HO
    VENTER, JC
    [J]. SCIENCE, 1995, 269 (5223) : 496 - 512