Perspectives sequence data base searching in the era of large-scale genomic sequencing

被引:11
作者
Smith, RF [1 ]
机构
[1] BAYLOR COLL MED, DEPT HUMAN MOL GENET, CTR HUMAN GENOME, HOUSTON, TX 77030 USA
关键词
D O I
10.1101/gr.6.8.653
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale sequencing of human and model organism genomes will have a profound impact on our ability to use sequence data base searching to predict the biochemical functions of sequences of interest. Despite the great value of more sequences in the data bases, a huge increase in data base size will also have adverse effects on data base searches. Upcoming problems will include (1) greatly increased search times, (2) an increase in background noise of high-scoring but biologically irrelevant matches, (3) inaccurate coding region prediction, leading to problems in protein data base searching, and (4) limited first-pass sequence annotation, making it difficult to determine the biological relevance of data base hits. Improved data base annotation tools and construction of smaller data bases of representative and highly-annotated sequences for first-pass analyses will be essential to deal with the impending flood of new genomic sequence.
引用
收藏
页码:653 / 660
页数:8
相关论文
共 44 条
[1]  
ADAMS MD, 1995, NATURE, V377, P3
[2]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]  
[Anonymous], METHOD ENZYMOL
[5]  
[Anonymous], 1978, Atlas of protein sequence and structure
[6]   Progress with the PRINTS protein fingerprint database [J].
Attwood, TK ;
Beck, ME ;
Bleasby, AJ ;
Degtyarenko, K ;
Smith, DJP .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :182-188
[7]   The SWISS-PROT protein sequence data bank and its new supplement TREMBL [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :21-25
[8]   The PROSITE database, its status in 1995 [J].
Bairoch, A ;
Bucher, P ;
Hofmann, K .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :189-196
[9]   GenBank [J].
Benson, DA ;
Boguski, M ;
Lipman, DJ ;
Ostell, J .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :1-5
[10]   THE C-ELEGANS GENOME SEQUENCING PROJECT [J].
BERKS, M .
GENOME RESEARCH, 1995, 5 (02) :99-104