Measures of diversity for populations and distances between individuals with highly reorganizable genomes

被引:39
作者
Mattiussi, C [1 ]
Waibel, M [1 ]
Floreano, D [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Swiss Fed Inst Technol Lausanne, Inst Syst Engn, CH-1015 Lausanne, Switzerland
关键词
evolutionary computation; genetic programming; variable length genomes; population diversity; substring diversity; Tanimoto distance; Jaccard similarity; linguistic complexity; nucleotide diversity;
D O I
10.1162/1063656043138923
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of defining a measure of diversity for a population of individuals whose genome can be subjected to major reorganizations during the evolutionary process. To this end, we introduce a measure of diversity for populations of strings of variable length defined on a finite alphabet, and from this measure we derive a semi-metric distance between pairs of strings. The definitions are based on counting the number of substrings of the strings, considered first separately and then collectively. This approach is related to the concept of linguistic complexity, whose definition we generalize from single strings to populations. Using the substring count approach we also define a new kind of Tanimoto distance between strings. We show how to extend the approach to representations that are not based on strings and, in particular, to the tree-based representations used in the field of genetic programming. We describe how suffix trees can allow these measures and distances to be implemented with a computational cost that is linear in both space and time relative to the length of the strings and the size of the population. The definitions were devised to assess the diversity of populations having genomes of variable length and variable structure during evolutionary computation runs, but applications in quantitative genomics, proteomics, and pattern recognition can be also envisaged.
引用
收藏
页码:495 / 515
页数:21
相关论文
共 27 条
[1]  
CROCHEMORE M, 2001, ALGORITHMIQUE TEXTE
[2]  
Crochemore M., 2002, JEWELS STRINGOLOGY
[3]  
deJong E., 2001, PROC GECCO 01, P11
[4]  
Graur D., 2000, FUNDAMENTALS MOL EVO
[5]  
GUSFIELD G, 1997, ALGORITHMS STRINGS T
[6]  
Haykin S., 1999, Neural Networks: A Comprehensive Foundation, V2nd ed
[7]  
KEIJZER M, 1996, ADV GENETIC PROGRAMM, V2, P259
[8]  
KELLER R, 1994, UNPUB EXPLICIT MAINT
[9]  
Langdon W.B., 2002, FDN GENETIC PROGRAMM, DOI DOI 10.1007/978-3-662-04726-2
[10]   Degree of population diversity - A perspective on premature convergence in genetic algorithms and its Markov chain analysis [J].
Leung, Y ;
Gao, Y ;
Xu, ZB .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (05) :1165-1176