PROMPT: a protein mapping and comparison tool

被引:24
作者
Schmidt, Thorsten [1 ]
Frishman, Dmitrij [1 ]
机构
[1] Tech Univ Munich, Dept Genome Oriented Bioinformat, Wissensch Zentrum Weihenstephan, D-85350 Freising Weihenstephan, Germany
关键词
D O I
10.1186/1471-2105-7-331
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Comparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available. Results: PROMPT is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. It allows automatic retrieval and integration of data from a multitude of molecular biological databases as well as from a custom XML format. Similarity-based mapping of sequence IDs makes it possible to link experimental information obtained from different sources despite discrepancies in gene identifiers and minor sequence variation. PROMPT provides a full set of statistical procedures to address the following four use cases: i) comparison of the frequencies of categorical annotations between two sets, ii) enrichment of nominal features in one set with respect to another one, iii) comparison of numeric distributions, and iv) correlation of numeric variables. Analysis results can be visualized in the form of plots and spreadsheets and exported in various formats, including Microsoft Excel. Conclusion: PROMPT is a versatile, platform-independent, easily expandable, stand-alone application designed to be a practical workhorse in analysing and mining protein sequences and associated annotation. The availability of the Java Application Programming Interface and scripting capabilities on one hand, and the intuitive Graphical User Interface with context-sensitive help system on the other, make it equally accessible to professional bioinformaticians and biologically-oriented users. PROMPT is freely available for academic users from http:// webclu. bio. wzw. tum. de/ prompt/.
引用
收藏
页数:15
相关论文
共 30 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[3]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[4]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D34-D38
[5]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[6]   GeneMerge - post-genomic analysis, data mining, and hypothesis testing [J].
Castillo-Davis, CI ;
Hartl, DL .
BIOINFORMATICS, 2003, 19 (07) :891-892
[7]   EMBL Nucleotide Sequence Database: developments in 2005 [J].
Cochrane, Guy ;
Aldebert, Philippe ;
Althorpe, Nicola ;
Andersson, Mikael ;
Baker, Wendy ;
Baldwin, Alastair ;
Bates, Kirsty ;
Bhattacharyya, Sumit ;
Browne, Paul ;
van den Broek, Alexandra ;
Castro, Matias ;
Duggan, Karyn ;
Eberhardt, Ruth ;
Faruque, Nadeem ;
Gamble, John ;
Kanz, Carola ;
Kulikova, Tamara ;
Lee, Charles ;
Leinonen, Rasko ;
Lin, Quan ;
Lombard, Vincent ;
Lopez, Rodrigo ;
McHale, Michelle ;
McWilliam, Hamish ;
Mukherjee, Gaurab ;
Nardone, Francesco ;
Pastor, Maria Pilar Garcia ;
Sobhany, Siamak ;
Stoehr, Peter ;
Tzouvara, Katerina ;
Vaughan, Robert ;
Wu, Dan ;
Zhu, Weimin ;
Apweiler, Rolf .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D10-D15
[8]  
Das Rajdeep, 2000, Functional and Integrative Genomics, V1, P76, DOI 10.1007/s101420050009
[9]   A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi:: barophily in the physicochemical properties of amino acids and in the genetic code [J].
Di Giulio, M .
GENE, 2005, 346 :1-6
[10]   Protein structural classes in five complete genomes [J].
Frishman, D ;
Mewes, HW .
NATURE STRUCTURAL BIOLOGY, 1997, 4 (08) :626-628