Numerous novel annotations of the human genome sequence supported by a 5′-end-enriched cDNA collection

被引:14
作者
Porcel, BM [1 ]
Delfour, O
Castelli, V
De Berardinis, V
Friedlander, L
Cruaud, C
Ureta-Vidal, A
Scarpelli, C
Wincker, P
Schächter, V
Saurin, W
Gyapay, G
Salanoubat, M
Weissenbach, J
机构
[1] Genoscope, Ctr Natl Sequencage, F-91000 Evry, France
[2] CNRS, UMR8030, F-91000 Evry, France
关键词
D O I
10.1101/gr.1481104
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A collection Of 90,000 human cDNA clones generated to increase the fraction Of "full-length" cDNAs available was analyzed by sequence alignment oil the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by Using this collection. Exon composition proposed for novel genes showed all average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this Subset, CpG islands were observed at the S' end of 75%. In-frame stop codons Upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding-genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that similar to380 gene models described in LocusLink Could be extended at their S' end by at least one new exon. Finally, this cDNA resource provided all experimental Support for annotations based exclusively On predictions, thus representing a resource substantially improving the human genome annotation.
引用
收藏
页码:463 / 471
页数:9
相关论文
共 32 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], GENOME BIOL
[4]   CpG islands as genomic footprints of promoters that are associated with replication origins [J].
Antequera, F ;
Bird, A .
CURRENT BIOLOGY, 1999, 9 (17) :R661-R667
[5]   Predicting full-length transcripts [J].
Brent, MR .
TRENDS IN BIOTECHNOLOGY, 2002, 20 (07) :273-275
[6]   Reevaluating human gene annotation: A second-generation analysis of chromosome 22 [J].
Collins, JE ;
Goward, ME ;
Cole, CG ;
Smink, LJ ;
Huckle, EJ ;
Knowles, S ;
Bye, JM ;
Beare, DM ;
Dunham, I .
GENOME RESEARCH, 2003, 13 (01) :27-36
[7]   The DNA sequence and comparative analysis of human chromosome 20 [J].
Deloukas, P ;
Matthews, LH ;
Ashurst, J ;
Burton, J ;
Gilbert, JGR ;
Jones, M ;
Stavrides, G ;
Almeida, JP ;
Babbage, AK ;
Bagguley, CL ;
Bailey, J ;
Barlow, KF ;
Bates, KN ;
Beard, LM ;
Beare, DM ;
Beasley, OP ;
Bird, CP ;
Blakey, SE ;
Bridgeman, AM ;
Brown, AJ ;
Buck, D ;
Burrill, W ;
Butler, AP ;
Carder, C ;
Carter, NP ;
Chapman, JC ;
Clamp, M ;
Clark, G ;
Clark, LN ;
Clark, SY ;
Clee, CM ;
Clegg, S ;
Cobley, VE ;
Collier, RE ;
Connor, R ;
Corby, NR ;
Coulson, A ;
Coville, GJ ;
Deadman, R ;
Dhami, P ;
Dunn, M ;
Ellington, AG ;
Frankland, JA ;
Fraser, A ;
French, L ;
Garner, P ;
Grafham, DV ;
Griffiths, C ;
Griffiths, ND ;
Gwilliam, R .
NATURE, 2001, 414 (6866) :865-U3
[8]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[9]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974
[10]   Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes [J].
Guigó, R ;
Dermitzakis, ET ;
Agarwal, P ;
Ponting, CP ;
Parra, G ;
Reymond, A ;
Abril, JF ;
Keibler, E ;
Lyle, R ;
Ucla, C ;
Antonarakis, SE ;
Brent, MR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (03) :1140-1145