Predicting DNA Motifs by Using Evolutionary Multiobjective Optimization

被引:14
作者
Gonzalez-Alvarez, David L. [1 ]
Vega-Rodriguez, Miguel A. [1 ]
Gomez-Pulido, Juan A. [1 ]
Sanchez-Perez, Juan M. [1 ]
机构
[1] Univ Extremadura, Dept Technol Computers & Commun, Caceres 10003, Spain
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2012年 / 42卷 / 06期
关键词
Computer applications; deoxyribonucleic acid (DNA); evolutionary algorithm; motif discovery; multiobjective optimization; GENETIC ALGORITHM; REGULATORY ELEMENTS; NONCODING SEQUENCES; DISCOVERY; IDENTIFICATION; PROTEIN; PATTERNS; SITES; SOLVE;
D O I
10.1109/TSMCC.2011.2172939
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bioinformatics and computational biology include researchers from many areas: biochemists, physicists, mathematicians, and engineers. The scale of the problems that are discussed ranges from small molecules to complex systems, where many organisms coexist. However, among all these issues, we can highlight genomics, which studies the genomes of microorganisms, plants, and animals. Predicting common patterns, i.e., motifs, in a set of deoxyribonucleic acid (DNA) sequences is one of the important sequence analysis problems, and it has not yet been resolved in an efficient manner. In this study, we study the application of evolutionary multiobjective optimization to solve the motif discovery problem, applied to the specific task of discovering novel transcription factor binding sites in DNA sequences. For this, we have designed, adapted, configured, and evaluated several types of multiobjective metaheuristics. After a detailed study, the results indicate that these metaheuristics are appropriate for discovering motifs. To find good approximations to the Pareto front, we use the hypervolume indicator, which has been successfully integrated into evolutionary algorithms. Besides the hypervolume indicator, we also use the coverage relation to ensure: Which is the best Pareto front? New results have been obtained, which significantly improve those published in previous research works.
引用
收藏
页码:913 / 925
页数:13
相关论文
共 46 条
[1]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[2]  
Che D, 2005, GECCO 2005: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOLS 1 AND 2, P447
[3]  
Congdon CB, 2005, PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P97
[4]   CENTRAL DOGMA OF MOLECULAR BIOLOGY [J].
CRICK, F .
NATURE, 1970, 227 (5258) :561-&
[5]   WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190
[6]   What are DNA sequence motifs? [J].
D'haeseleer, P .
NATURE BIOTECHNOLOGY, 2006, 24 (04) :423-425
[7]   A fast and elitist multiobjective genetic algorithm: NSGA-II [J].
Deb, K ;
Pratap, A ;
Agarwal, S ;
Meyarivan, T .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) :182-197
[8]   NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence [J].
Down, TA ;
Hubbard, TJP .
NUCLEIC ACIDS RESEARCH, 2005, 33 (05) :1445-1453
[9]  
Eskin Eleazar, 2002, Bioinformatics, V18 Suppl 1, pS354
[10]   Evolutionary computation for discovery of composite transcription factor binding sites [J].
Fogel, Gary B. ;
Porto, V. William ;
Varga, Gabor ;
Dow, Ernst R. ;
Craven, Andrew M. ;
Powers, David M. ;
Harlow, Harry B. ;
Su, Eric W. ;
Onyia, Jude E. ;
Su, Chen .
NUCLEIC ACIDS RESEARCH, 2008, 36 (21)