The construction of Arabidopsis expressed sequence tag assemblies - A new resource to facilitate gene identification

被引:70
作者
Rounsley, SD [1 ]
Glodek, A [1 ]
Sutton, G [1 ]
Adams, MD [1 ]
Somerville, CR [1 ]
Venter, JC [1 ]
Kerlavage, AR [1 ]
机构
[1] CARNEGIE INST WASHINGTON,DEPT PLANT BIOL,STANFORD,CA 94305
关键词
D O I
10.1104/pp.112.3.1177
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a ''contig'' or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.
引用
收藏
页码:1177 / 1183
页数:7
相关论文
共 25 条
[1]   SEQUENCE IDENTIFICATION OF 2,375 HUMAN BRAIN GENES [J].
ADAMS, MD ;
DUBNICK, M ;
KERLAVAGE, AR ;
MORENO, R ;
KELLEY, JM ;
UTTERBACK, TR ;
NAGLE, JW ;
FIELDS, C ;
VENTER, JC .
NATURE, 1992, 355 (6361) :632-634
[2]  
ADAMS MD, 1995, NATURE, V377, P3
[3]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]   MAJORITY OF RANDOM CDNA CLONES CORRESPOND TO SINGLE LOCI IN THE TOMATO GENOME [J].
BERNATZKY, R ;
TANKSLEY, SD .
MOLECULAR & GENERAL GENETICS, 1986, 203 (01) :8-14
[6]   DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS [J].
BOGUSKI, MS ;
LOWE, TMJ ;
TOLSTOSHEV, CM .
NATURE GENETICS, 1993, 4 (04) :332-333
[7]   BLAZE (TM) - AN IMPLEMENTATION OF THE SMITH-WATERMAN SEQUENCE COMPARISON ALGORITHM ON A MASSIVELY-PARALLEL COMPUTER [J].
BRUTLAG, DL ;
DAUTRICOURT, JP ;
DIAZ, R ;
FIER, J ;
MOXON, B ;
STAMM, R .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :203-207
[8]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073
[9]   Further progress towards a catalogue of all Arabidopsis genes: Analysis of a set of 5000 non-redundant ESTs [J].
Cooke, R ;
Raynal, M ;
Laudie, M ;
Grellet, F ;
Delseny, M ;
Morris, PC ;
Guerrier, D ;
Giraudat, J ;
Quigley, F ;
Clabault, G ;
Li, YF ;
Mache, R ;
Krivitzky, M ;
Gy, IJJ ;
Kreis, M ;
Lecharny, A ;
Parmentier, Y ;
Marbach, J ;
Fleck, J ;
Clement, B ;
Philipps, G ;
Herve, C ;
Bardet, C ;
Tremousaygue, D ;
Lescure, B ;
Lacomme, C ;
Roby, D ;
Jourjon, MF ;
Chabrier, P ;
Charpenteau, JL ;
Desprez, T ;
Amselem, J ;
Chiapello, H ;
Hofte, H .
PLANT JOURNAL, 1996, 9 (01) :101-124
[10]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512