Peakbin Selection in Mass Spectrometry Data Using a Consensus Approach with Estimation of Distribution Algorithms

被引:17
作者
Armananzas, Ruben [1 ]
Saeys, Yvan [2 ]
Inza, Inaki [3 ]
Garcia-Torres, Miguel [4 ]
Bielza, Concha [1 ]
van de Peer, Yves [2 ]
Larranaga, Pedro [1 ]
机构
[1] Univ Politecn Madrid, Computat Intelligence Grp, Dept Inteligencia Artificial, Boadilla Del Monte 28660, Spain
[2] Univ Ghent VIB, Bioinformat & Syst Biol Grp, B-9052 Ghent, Belgium
[3] Univ Basque Country, Dept Comp Sci & Artificial Intelligence, Fac Informat, Donostia San Sebastian 20018, Spain
[4] Pablo de Olavide Univ, Area Languages & Comp Syst, Seville 41013, Spain
关键词
Mass spectrometry; EDA; feature selection; biomarker discovery; LASER-DESORPTION; CLASSIFICATION; SERUM; SPECTRA; REPRODUCIBILITY; PREDICTION; PATTERNS;
D O I
10.1109/TCBB.2010.18
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Progress is continuously being made in the quest for stable biomarkers linked to complex diseases. Mass spectrometers are one of the devices for tackling this problem. The data profiles they produce are noisy and unstable. In these profiles, biomarkers are detected as signal regions (peaks), where control and disease samples behave differently. Mass spectrometry (MS) data generally contain a limited number of samples described by a high number of features. In this work, we present a novel class of evolutionary algorithms, estimation of distribution algorithms (EDA), as an efficient peak selector in this MS domain. There is a trade-of f between the reliability of the detected biomarkers and the low number of samples for analysis. For this reason, we introduce a consensus approach, built upon the classical EDA scheme, that improves stability and robustness of the final set of relevant peaks. An entire data workflow is designed to yield unbiased results. Four publicly available MS data sets (two MALDI-TOF and another two SELDI-TOF) are analyzed. The results are compared to the original works, and a new plot (peak frequential plot) for graphically inspecting the relevant peaks is introduced. A complete online supplementary page, which can be found at http://www.sc.ehu.es/ccwbayes/members/ruben/ms, includes extended info and results, in addition to Matlab scripts and references.
引用
收藏
页码:760 / 774
页数:15
相关论文
共 57 条
[1]  
[Anonymous], 2001, Pattern Classification
[2]   A review of estimation of distribution algorithms in bioinformatics [J].
Armananzas, Ruben ;
Inza, Inaki ;
Santana, Roberto ;
Saeys, Yvan ;
Luis Flores, Jose ;
Antonio Lozano, Jose ;
Van de Peer, Yves ;
Blanco, Rosa ;
Robles, Victor ;
Bielza, Concha ;
Larranaga, Pedro .
BIODATA MINING, 2008, 1 (1)
[3]   Microarray Analysis of Autoimmune Diseases by Machine Learning Procedures [J].
Armananzas, Ruben ;
Calvo, Borja ;
Inza, Inaki ;
Lopez-Hoyos, Marcos ;
Martinez-Taboada, Victor ;
Ucar, Eduardo ;
Bernales, Irantzu ;
Fullaondo, Asier ;
Larranaga, Pedro ;
Zubiaga, Ana M. .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (03) :341-350
[4]   Signal in noise: Evaluating reported reproducibility of serum proteomic tests for ovarian cancer [J].
Baggerly, KA ;
Morris, JS ;
Edmonson, SR ;
Coombes, KR .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2005, 97 (04) :307-309
[5]   Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments [J].
Baggerly, KA ;
Morris, JS ;
Coombes, KR .
BIOINFORMATICS, 2004, 20 (05) :777-U710
[6]   Machine learning methods for predictive proteomics [J].
Barla, Annalisa ;
Jurman, Giuseppe ;
Riccadonna, Samantha ;
Merler, Stefano ;
Chierici, Marco ;
Furlanello, Cesare .
BRIEFINGS IN BIOINFORMATICS, 2008, 9 (02) :119-128
[7]  
Bosman PAN, 1999, GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, P60
[8]   An intensity-region driven multi-classifier scheme for improving the classification accuracy of proteomic MS-spectra [J].
Bougioukos, Panagiotis ;
Glotsos, Dimitris ;
Cavouras, Dionisis ;
Daskalakis, Antonis ;
Kalatzis, Ioannis ;
Kostopoulos, Spiros ;
Nikiforidis, George ;
Bezerianos, Anastasios .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2010, 99 (02) :147-153
[9]  
Breen EJ, 2000, ELECTROPHORESIS, V21, P2243, DOI 10.1002/1522-2683(20000601)21:11<2243::AID-ELPS2243>3.0.CO
[10]  
2-K