PhenoLink - a web-tool for linking phenotype to ∼omics data for bacteria: application to gene-trait matching for Lactobacillus plantarum strains

被引:44
作者
Bayjanov, Jumamurat R. [1 ,2 ]
Molenaar, Douwe [4 ,5 ]
Tzeneva, Vesela [3 ,4 ]
Siezen, Roland J. [1 ,2 ,3 ,4 ]
van Hijum, Sacha A. F. T. [1 ,2 ,3 ,4 ]
机构
[1] Radboud Univ Nijmegen, Ctr Mol & Biomol Informat, Med Ctr, NL-6525 ED Nijmegen, Netherlands
[2] Netherlands Bioinformat Ctr, NL-6500 HB Nijmegen, Netherlands
[3] TI Food & Nutr, NL-6700 AN Wageningen, Netherlands
[4] NIZO Food Res, Kluyver Ctr Genom Ind Fermentat, NL-6710 BA Ede, Netherlands
[5] Free Univ Amsterdam, Syst Bioinformat IBIVU, NL-1081 HV Amsterdam, Netherlands
来源
BMC GENOMICS | 2012年 / 13卷
关键词
VARIABLE IMPORTANCE MEASURES; PREDICTION; NETWORKS;
D O I
10.1186/1471-2164-13-170
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Linking phenotypes to high-throughput molecular biology information generated by similar to omics technologies allows revealing cellular mechanisms underlying an organism's phenotype. similar to Omics datasets are often very large and noisy with many features (e. g., genes, metabolite abundances). Thus, associating phenotypes to similar to omics data requires an approach that is robust to noise and can handle large and diverse data sets. Results: We developed a web-tool PhenoLink (http://bamics2.cmbi.ru.nl/websoftware/phenolink/) that links phenotype to similar to omics data sets using well-established as well new techniques. PhenoLink imputes missing values and preprocesses input data (i) to decrease inherent noise in the data and (ii) to counterbalance pitfalls of the Random Forest algorithm, on which feature (e. g., gene) selection is based. Preprocessed data is used in feature (e. g., gene) selection to identify relations to phenotypes. We applied PhenoLink to identify gene-phenotype relations based on the presence/absence of 2847 genes in 42 Lactobacillus plantarum strains and phenotypic measurements of these strains in several experimental conditions, including growth on sugars and nitrogen-dioxide production. Genes were ranked based on their importance (predictive value) to correctly predict the phenotype of a given strain. In addition to known gene to phenotype relations we also found novel relations. Conclusions: PhenoLink is an easily accessible web-tool to facilitate identifying relations from large and often noisy phenotype and similar to omics datasets. Visualization of links to phenotypes offered in PhenoLink allows prioritizing links, finding relations between features, finding relations between phenotypes, and identifying outliers in phenotype data. PhenoLink can be used to uncover phenotype links to a multitude of similar to omics data, e. g., gene presence/absence (determined by e. g.: CGH or next-generation sequencing), gene expression (determined by e. g.: microarrays or RNA-seq), or metabolite abundance (determined by e. g.: GC-MS).
引用
收藏
页数:12
相关论文
共 28 条
[1]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]   Class prediction for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2010, 11 :523
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Brooijmans R.J.W., 2008, ELECT TRANSPORT CHAI
[5]   MINOMICS: visualizing prokaryote transcriptomics and proteomics data in a genomic context [J].
Brouwer, Rutger W. W. ;
van Hijum, Sacha A. F. T. ;
Kuipers, Oscar P. .
BIOINFORMATICS, 2009, 25 (01) :139-140
[6]  
Chao Chen.Andy Liaw. Leo Breiman., 2004, Using random forest to learn imbalanced data
[7]  
Cleveland W.S., 1992, Statistical Models in S, P309, DOI DOI 10.1201/9780203738535-8
[8]  
DOMAGK GF, 1963, BIOCHEM Z, V339, P145
[9]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[10]  
Fields Development Team, 2006, FIELDS DEV TEAM FIEL