Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C-elegans

被引:129
作者
Reboul, J
Vaglio, P
Tzellas, N
Thierry-Mieg, N
Moore, T
Jackson, C
Shin-i, T
Kohara, Y
Thierry-Mieg, D
Thierry-Mieg, J
Lee, H
Hitti, J
Doucette-Stamm, L
Hartley, JL
Temple, GF
Brasch, MA
Vandenhaute, J
Lamesch, PE
Hill, DE
Vidal, M
机构
[1] Harvard Univ, Sch Med, Dana Farber Canc Inst, Boston, MA 02115 USA
[2] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[3] IMAG, Lab LSR, St Martin Dheres, France
[4] Res Genet Huntsville, Huntsville, AL USA
[5] Natl Inst Genet, Genome Biol Lab, Mishima, Shizuoka 411, Japan
[6] NIH, Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[7] Genome Therapeut Corp, Waltham, MA USA
[8] Life Technol Inc, Rockville, MD USA
[9] Fac Univ Notre Dame Paix, Dept Biol, B-5000 Namur, Belgium
关键词
D O I
10.1038/85913
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The genome sequences of Caenorhabditis elegans. Drosophila melanogaster and Arabidopsis thaliana have been predicted to contain 19,000, 13,600 and 25,500 genes, respectively(1-3). Before this information can be fully used for evolutionary and functional studies, several issues need to be addressed. First, the gene number estimates obtained in silico and not yet supported by any experimental data need to be verified. For example, it seems biologically paradoxical that C.elegans would have 50% more genes than Drosophilia. Second, intron/exon predictions need to be tested experimentally. Third, complete sets of open reading frames (ORFs). or "ORFeomes."(4) need to be cloned into various expression vectors. To address these issues simultaneously, we have designed and applied to C. elegans the following strategy. Predicted ORFs are amplified by PCR from a highly representative cDNA library(4) using ORF-specific primers, cloned by Gateway recombination cloning(4-6) and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. In a sample (n=1,222) of the nearly 10,000 genes predicted ab initio (that is, for which no expressed sequence tag (EST) is available so far), at least 70% were verified by OSTs. We also observed that 27% of these experimentally confirmed genes have a structure different from that predicted by GeneFinder. We now have experimental evidence that supports the existence of at least 17,300 genes in C. elegans. Hence we suggest that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.
引用
收藏
页码:332 / 336
页数:5
相关论文
共 13 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[3]   Genome sequence of the nematode C-elegans:: A platform for investigating biology [J].
不详 .
SCIENCE, 1998, 282 (5396) :2012-2018
[4]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[5]   Analysis of expressed sequence tags indicates 35,000 human genes [J].
Ewing, B ;
Green, P .
NATURE GENETICS, 2000, 25 (02) :232-234
[6]   Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome [J].
Gopal, S ;
Schroeder, M ;
Pieper, U ;
Sczyrba, A ;
Aytekin-Kurban, G ;
Bekiranov, S ;
Fajardo, JE ;
Eswar, N ;
Sanchez, R ;
Sali, A ;
Gaasterland, T .
NATURE GENETICS, 2001, 27 (03) :337-340
[7]   DNA cloning using in vitro site-specific recombination [J].
Hartley, JL ;
Temple, GF ;
Brasch, MA .
GENOME RESEARCH, 2000, 10 (11) :1788-1795
[8]   Genomic analysis of gene expression in C-elegans [J].
Hill, AA ;
Hunter, CP ;
Tsung, BT ;
Tucker-Kellogg, G ;
Brown, EL .
SCIENCE, 2000, 290 (5492) :809-812
[9]  
Liang F, 2000, NAT GENET, V26, P501
[10]   Gene Index analysis of the human genome estimates approximately 120,000 genes [J].
Liang, F ;
Holt, I ;
Pertea, G ;
Karamycheva, S ;
Salzberg, SL ;
Quackenbush, J .
NATURE GENETICS, 2000, 25 (02) :239-240