Choosing BLAST options for better detection of orthologs as reciprocal best hits

被引:360
作者
Moreno-Hagelsieb, Gabriel [1 ]
Latimer, Kristen [1 ]
机构
[1] Wilfrid Laurier Univ, Dept Biol, Waterloo, ON N2L 3C5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1093/bioinformatics/btm585
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The analyses of the increasing number of genome sequences requires shortcuts for the detection of orthologs, such as Reciprocal Best Hits (RBH), where orthologs are assumed if two genes each in a different genome find each other as the best hit in the other genome. Two BLAST options seem to affect alignment scores the most, and thus the choice of a best hit: the filtering of low information sequence segments and the algorithm used to produce the final alignment. Thus, we decided to test whether such options would help better detect orthologs. Results: Using Escherichia coli K12 as an example, we compared the number and quality of orthologs detected as RBH. We tested four different conditions derived from two options: filtering of low-information segments, hard (default) versus soft; and alignment algorithm, default (based on matching words) versus SmithWaterman. All options resulted in significant differences in the number of orthologs detected, with the highest numbers obtained with the combination of soft filtering with SmithWaterman alignments. We compared these results with those of Reciprocal Shortest Distances (RSD), supposed to be superior to RBH because it uses an evolutionary measure of distance, rather than BLAST statistics, to rank homologs and thus detect orthologs. RSD barely increased the number of orthologs detected over those found with RBH. Error estimates, based on analyses of conservation of gene order, found small differences in the quality of orthologs detected using RBH. However, RSD showed the highest error rates. Thus, RSD have no advantages over RBH.
引用
收藏
页码:319 / 324
页数:6
相关论文
共 30 条
[1]   Automatic clustering of orthologs and inparalogs shared by multiple proteomes [J].
Alexeyenko, Andrey ;
Tamas, Ivica ;
Liu, Gang ;
Sonnhammer, Erik L. L. .
BIOINFORMATICS, 2006, 22 (14) :E9-E15
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[5]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[6]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[7]   Predicting function: From genes to genomes and back [J].
Bork, P ;
Dandekar, T ;
Diaz-Lazcoz, Y ;
Eisenhaber, F ;
Huynen, M ;
Yuan, YP .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 283 (04) :707-725
[8]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[9]   Roundup: a multi-genome repository of orthologs and evolutionary distances [J].
DeLuca, Todd F. ;
Wu, I-Hsien ;
Pu, Jian ;
Monaghan, Thomas ;
Peshkin, Leonid ;
Singh, Saurav ;
Wall, Dennis P. .
BIOINFORMATICS, 2006, 22 (16) :2044-2046
[10]   What is dynamic programming? [J].
Eddy, SR .
NATURE BIOTECHNOLOGY, 2004, 22 (07) :909-910