Learning multiple evolutionary pathways from cross-sectional data

被引:79
作者
Beerenwinkel, N
Rahnenführer, J
Däumer, M
Hoffmann, D
Kaiser, R
Selbig, J
Lengauer, T
机构
[1] Max Planck Inst Informat, D-66123 Saarbrucken, Germany
[2] Univ Cologne, Inst Virol, D-50935 Cologne, Germany
[3] Ctr Adv European Studies & Res, D-53175 Bonn, Germany
[4] Max Planck Inst Mol Plant Physiol, D-14476 Golm, Germany
关键词
mixture models; tree models; Bayesian networks; EM algorithm; HIV drug resistance; mutational pathways;
D O I
10.1089/cmb.2005.12.584
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We introduce a mixture model of trees to describe evolutionary processes that are characterized by the ordered accumulation of permanent genetic changes. The basic building block of the model is a directed weighted tree that generates a probability distribution on the set of all patterns of genetic events. We present an EM-like algorithm for learning a mixture model of K trees and show how to determine K with a maximum likelihood approach. As a case study, we consider the accumulation of mutations in the HIV-1 reverse transcriptase that are associated with drug resistance. The fitted model is statistically validated as a density estimator, and the stability of the model topology is analyzed. We obtain a generative probabilistic model for the development of drug resistance in HIV that agrees with biological knowledge. Further applications and extensions of the model are discussed.
引用
收藏
页码:584 / 598
页数:15
相关论文
共 39 条
[1]  
[Anonymous], 1998, MONOGRAPHS STAT APPL
[2]   Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype [J].
Beerenwinkel, N ;
Schmidt, B ;
Walter, H ;
Kaiser, R ;
Lengauer, T ;
Hoffmann, D ;
Korn, K ;
Selbig, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (12) :8271-8276
[3]   Geno2pheno: Interpreting genotypic HIV drug resistance tests [J].
Beerenwinkel, N ;
Lengauer, T ;
Selbig, J ;
Schmidt, B ;
Walter, H ;
Korn, K ;
Kaiser, R ;
Hoffmann, D .
IEEE INTELLIGENT SYSTEMS, 2001, 16 (06) :35-41
[4]  
BEERENWINKEL N, 2003, P 11 INT C INT SYST, V19, pI16
[5]  
BEERENWINKEL N, 2001, P GERM C BIOINF BRAU, P126
[6]   ORDERED APPEARANCE OF ZIDOVUDINE RESISTANCE MUTATIONS DURING TREATMENT OF 18 HUMAN IMMUNODEFICIENCY VIRUS-POSITIVE SUBJECTS [J].
BOUCHER, CAB ;
OSULLIVAN, E ;
MULDER, JW ;
RAMAUTARSING, C ;
KELLAM, P ;
DARBY, G ;
LANGE, JMA ;
GOUDSMIT, J ;
LARDER, BA .
JOURNAL OF INFECTIOUS DISEASES, 1992, 165 (01) :105-110
[7]   APPROXIMATING DISCRETE PROBABILITY DISTRIBUTIONS WITH DEPENDENCE TREES [J].
CHOW, CK ;
LIU, CN .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (03) :462-+
[8]  
CHU YJ, 1965, SCI SINICA, V14, P1396
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   Inferring tree models for oncogenesis from comparative genome hybridization data [J].
Desper, R ;
Jiang, F ;
Kallioniemi, OP ;
Moch, H ;
Papadimitriou, CH ;
Schäffer, AA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (01) :37-51