Whole-genome shotgun assembly and comparison of human genome assemblies

被引:138
作者
Istrail, S
Sutton, GG
Florea, L
Halpern, AL
Mobarry, CM
Lippert, R
Walenz, B
Shatkay, H
Dew, I
Miller, JR
Flanigan, MJ
Edwards, NJ
Bolanos, R
Fasulo, D
Halldorsson, BV
Hannenhalli, S
Turner, R
Yooseph, S
Lu, F
Nusskern, DR
Shue, BC
Zheng, XQH
Zhong, F
Delcher, AL
Huson, DH
Kravitz, SA
Mouchard, L
Reinert, K
Remington, KA
Clark, AG
Waterman, MS
Eichler, EE
Adams, MD
Hunkapiller, MW
Myers, EW
Venter, JC
机构
[1] Ctr Advancement Genom, Rockville, MD 20850 USA
[2] Appl Biosyst Inc, Rockville, MD 20850 USA
[3] Celera Genom, Rockville, MD 20850 USA
[4] Inst Genom Res, Rockville, MD 20850 USA
[5] Cornell Univ, Dept Genet & Mol Biol, Ithaca, NY 14853 USA
[6] Univ So Calif, Dept Math, Los Angeles, CA 90033 USA
[7] Case Western Reserve Univ, Dept Genet, Cleveland, OH 44106 USA
[8] Appl Biosyst Inc, Foster City, CA 94404 USA
[9] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
10.1073/pnas.0307971100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 [理学]; 0710 [生物学]; 09 [农学];
摘要
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J.,,Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.
引用
收藏
页码:1916 / 1921
页数:6
相关论文
共 31 条
[1]
The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]
Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[3]
Recent segmental duplications in the human genome [J].
Bailey, JA ;
Gu, ZP ;
Clark, RA ;
Reinert, K ;
Samonte, RV ;
Schwartz, S ;
Adams, MD ;
Myers, EW ;
Li, PW ;
Eichler, EE .
SCIENCE, 2002, 297 (5583) :1003-1007
[4]
AVID: A global alignment program [J].
Bray, N ;
Dubchak, I ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (01) :97-102
[5]
LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA [J].
Brudno, M ;
Do, CB ;
Cooper, GM ;
Kim, MF ;
Davydov, E ;
Green, ED ;
Sidow, A ;
Batzoglou, S .
GENOME RESEARCH, 2003, 13 (04) :721-731
[6]
Glocal alignment: finding rearrangements during alignment [J].
Brudno, Michael ;
Malde, Sanket ;
Poliakov, Alexander ;
Do, Chuong B. ;
Couronne, Olivier ;
Dubchak, Inna ;
Batzoglou, Serafim .
BIOINFORMATICS, 2003, 19 :i54-i62
[7]
Celniker Susan., 2002, GENOME BIOL, V3, DOI [10.1186/gb-2002-3-12-research0079, DOI 10.1186/GB-2002-3-12-RESEARCH0079]
[8]
A vision for the future of genomics research [J].
Collins, FS ;
Green, ED ;
Guttmacher, AE ;
Guyer, MS .
NATURE, 2003, 422 (6934) :835-847
[9]
Masquerading repeats: Paralogous pitfalls of the human genome [J].
Eichler, EE .
GENOME RESEARCH, 1998, 8 (08) :758-762
[10]
WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512