Limits of homology detection by pairwise sequence comparison

被引:17
作者
Spang, R [1 ]
Vingron, M [1 ]
机构
[1] Deutsch Krebsforschungszentrum, Theroet Bioinformat, D-69120 Heidelberg, Germany
关键词
D O I
10.1093/bioinformatics/17.4.338
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. Results: We have investigated noise levels in pairwise alignment based database searches. The noise levels of 38 releases of the SwissProt database, display perfect logarithmic growth with the total length of the databases. Clustering of real biological sequences reduces noise levels, but the effect is marginal.
引用
收藏
页码:338 / 342
页数:5
相关论文
共 26 条
[1]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]  
[Anonymous], 1994, Ann. Prob
[5]  
[Anonymous], 1978, ATLAS PROTEIN SEQUEN
[6]   AN EXTREME VALUE THEORY FOR SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, M .
ANNALS OF STATISTICS, 1986, 14 (03) :971-993
[7]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[8]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[9]  
Eddy S R, 1995, J Comput Biol, V2, P9, DOI 10.1089/cmb.1995.2.9
[10]   ON A NEW LAW OF LARGE NUMBERS [J].
ERDOS, P ;
RENYI, A .
JOURNAL D ANALYSE MATHEMATIQUE, 1970, 23 :103-&