A new protein linear motif benchmark for multiple sequence alignment software

被引:19
作者
Perrodou, Emmanuel [1 ,2 ,3 ,4 ]
Chica, Claudia [5 ]
Poch, Olivier [2 ,3 ,4 ]
Gibson, Toby J. [5 ]
Thompson, Julie D. [2 ,3 ,4 ]
机构
[1] IGBMC, Dept Struct Biol & Genom, F-67400 Illkirch Graffenstaden, France
[2] INSERM, U596, F-67400 Illkirch Graffenstaden, France
[3] CNRS, UMR7104, F-67400 Illkirch Graffenstaden, France
[4] Univ Strasbourg, F-67000 Strasbourg, France
[5] European Mol Biol Lab, D-69012 Heidelberg, Germany
关键词
D O I
10.1186/1471-2105-9-213
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. Results: We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. Conclusion: We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.
引用
收藏
页数:15
相关论文
共 47 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The universal protein resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Dobrokhotov, Pavel ;
Dornevil, Dolnide ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Ioannidis, Vassilios ;
Ivanyi, Ivan ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente ;
Lemercier, Philippe ;
Le Saux, Virginie .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D193-D197
[3]   Minimotif Miner: a tool for investigating protein function [J].
Balla, S ;
Thapar, V ;
Verma, S ;
Luong, T ;
Faghri, T ;
Huang, CH ;
Rajasekaran, S ;
del Campo, JJ ;
Shinn, JH ;
Mohler, WA ;
Maciejewski, MW ;
Gryk, MR ;
Piccirillo, B ;
Schiller, SR ;
Schiller, MR .
NATURE METHODS, 2006, 3 (03) :175-177
[4]  
Blackshields Gordon, 2006, In Silico Biol, V6, P321
[5]   Rational drug design via intrinsically disordered protein [J].
Cheng, Yugong ;
LeGall, Tanguy ;
Oldfield, Christopher J. ;
Mueller, James P. ;
Van, Ya-Yue J. ;
Romero, Pedro ;
Cortese, Marc S. ;
Uversky, Vladimir N. ;
Dunker, A. Keith .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (10) :435-442
[6]   Multiple sequence alignment with the Clustal series of programs [J].
Chenna, R ;
Sugawara, H ;
Koike, T ;
Lopez, R ;
Gibson, TJ ;
Higgins, DG ;
Thompson, JD .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3497-3500
[7]   Cyclin-dependent kinase inhibitors sensitize tumor cells to nutlin-induced apoptosis: a potent drug combination [J].
Cheok, Chit Fang ;
Dey, Anresha ;
Lane, David P. .
MOLECULAR CANCER RESEARCH, 2007, 5 (11) :1133-1145
[8]  
CHICA C, 2008, IN PRESS BMC BIOINFO
[9]   SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent [J].
Davey, Norman E. ;
Shields, Denis C. ;
Edwards, Richard J. .
NUCLEIC ACIDS RESEARCH, 2006, 34 (12) :3546-3554
[10]   A computational strategy for the prediction of functional linear peptide motifs in proteins [J].
Dinkel, Holger ;
Sticht, Heinrich .
BIOINFORMATICS, 2007, 23 (24) :3297-3303