Use of artificial genomes in assessing methods for atypical gene detection

被引:27
作者
Azad, RK [1 ]
Lawrence, JG [1 ]
机构
[1] Univ Pittsburgh, Dept Biol Sci, Pittsburgh, PA 15260 USA
基金
美国国家科学基金会;
关键词
D O I
10.1371/journal.pcbi.0010056
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Parametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods-as well as the evaluation and proper implementation of existing methods-relies on an arbitrary assessment of performance using real genomes, where the evolutionary histories of genes are not known. We have used the framework of a generalized hidden Markov model to create artificial genomes modeled after genuine genomes. To model a genome, "core" genes-those displaying patterns of mutational biases shared among large numbers of genes-are identified by a novel gene clustering approach based on the Akaike information criterion. Gene models derived from multiple "core" gene clusters are used to generate an artificial genome that models the properties of a genuine genome. Chimeric artificial genomes-representing those having experienced lateral gene transfer-were created by combining genes from multiple artificial genomes, and the performance of the parametric methods for identifying "atypical" genes was assessed directly. We found that a hidden Markov model that included multiple gene models, each trained on sets of genes representing the range of genotypic variability within a genome, could produce artificial genomes that mimicked the properties of genuine genomes. Moreover, different methods for detecting foreign genes performed differently-i.e., they had different sets of strengths and weaknesses-when identifying atypical genes within chimeric artificial genomes.
引用
收藏
页码:461 / 473
页数:13
相关论文
共 50 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 1999, AKAIKE INFORM CRITER
[3]   Probabilistic methods of identifying genes in prokaryotic genomes: Connections to the FIMM theory [J].
Azad, RK ;
Borodovsky, M .
BRIEFINGS IN BIOINFORMATICS, 2004, 5 (02) :118-130
[4]  
BORODOVSKII MY, 1986, MOL BIOL+, V20, P833
[5]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]   Computing prokaryotic gene ubiquity: Rescuing the core from extinction [J].
Charlebois, RL ;
Doolittle, WF .
GENOME RESEARCH, 2004, 14 (12) :2469-2477
[8]   How big is the iceberg of which organellar genes in nuclear genomes are but the tip? [J].
Doolittle, WF ;
Boucher, Y ;
Nesbo, CL ;
Douady, CJ ;
Andersson, JO ;
Roger, AJ .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2003, 358 (1429) :39-57
[9]   Lateral genomics (Reprinted from Trends in Biochemical Science, vol 12, Dec., 1999) [J].
Doolittle, WF .
TRENDS IN CELL BIOLOGY, 1999, 9 (12) :M5-M8
[10]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids