A comparative analysis of HGSC and Celera human genome assemblies and gene sets

被引:6
作者
Li, SY
Cutler, G
Liu, JJJ
Hoey, T
Chen, LB
Schultz, PG
Liao, JY
Ling, XFB
机构
[1] Tularik Inc, San Francisco, CA 94080 USA
[2] Chinese Acad Sci, Inst Genet & Dev Biol, Beijing 100101, Peoples R China
[3] Novartis Res Fdn, Genom Inst, San Diego, CA 92121 USA
关键词
D O I
10.1093/bioinformatics/btg219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the simultaneous publication of the human genome assembly by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics, several comparisons have been made of various aspects of these two assemblies. In this work, we set out to provide a more comprehensive comparative analysis of the two assemblies and their associated gene sets. Results: The local sequence content for both draft genome assemblies has been similar since the early releases, however it took a year for the quality of the Celera assembly to approach that of HGSC, suggesting an advantage of HGSC's hierarchical shotgun (HS) sequencing strategy over Celera's whole genome shotgun (WGS) approach. While similar numbers of ab initio predicted genes can be derived from both assemblies, Celera's Otto approach consistently generated larger, more varied gene sets than the Ensembl gene build system. The presence of a non-overlapping gene set has persisted with successive data releases from both groups. Since most of the unique genes from either genome assembly could be mapped back to the other assembly, we conclude that the gene set discrepancies do not reflect differences in local sequence content but rather in the assemblies and especially the different gene-prediction methodologies.
引用
收藏
页码:1597 / 1605
页数:9
相关论文
共 20 条
[1]   Computational comparison of two draft sequences of the human genome [J].
Aach, J ;
Bulyk, ML ;
Church, GM ;
Comander, J ;
Derti, A ;
Shendure, J .
NATURE, 2001, 409 (6822) :856-859
[2]   The independence of our genome assemblies [J].
Adams, MD ;
Sutton, GG ;
Smith, HO ;
Myers, EW ;
Venter, JC .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3025-3026
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[7]   A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes [J].
Hogenesch, JB ;
Ching, KA ;
Batalov, S ;
Su, AI ;
Walker, JR ;
Zhou, YY ;
Kay, SA ;
Schultz, PG ;
Cooke, MP .
CELL, 2001, 106 (04) :413-415
[8]   The Ensembl genome database project [J].
Hubbard, T ;
Barker, D ;
Birney, E ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Huminiecki, L ;
Kasprzyk, A ;
Lehvaslaiho, H ;
Lijnzaad, P ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Pocock, M ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Clamp, M .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :38-41
[9]  
Huson D H, 2001, Bioinformatics, V17 Suppl 1, pS132
[10]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]