Robust Supervised and Unsupervised Statistical Learning for HIV Type 1 Coreceptor Usage Analysis

被引:17
作者
Prosperi, Mattia C. F. [1 ]
Fanti, Iuri [2 ]
Ulivi, Giovanni [3 ]
Micarelli, Alessandro [3 ]
De Luca, Andrea [2 ]
Zazzi, Maurizio [4 ]
机构
[1] Natl Inst Infect Dis L Spallanzani, Dept Virol, I-00149 Rome, Italy
[2] Univ Cattolica Sacro Cuore, Infect Dis Clin, I-00148 Rome, Italy
[3] Univ Roma TRE, Dept Comp Sci & Automat DIA, I-00146 Rome, Italy
[4] Univ Siena, Policlin Le Scotte, Dept Mol Biol, Microbiol Sect, I-53100 Siena, Italy
关键词
IMMUNODEFICIENCY-VIRUS TYPE-1; MULTIPLE SEQUENCE ALIGNMENT; PREDICTION; CCR5; RECEPTOR; FUSION; INFECTION; ENTRY; ASSAY;
D O I
10.1089/aid.2008.0039
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
071005 [微生物学]; 100108 [医学免疫学];
摘要
Human immunodeficiency virus type 1 (HIV-1) isolates differ in their use of coreceptors to enter target cells. This has important implications for both viral pathogenicity and susceptibility to entry inhibitors, recently approved or under development. Predicting HIV-1 coreceptor usage on the basis of sequence information is a challenging task, due to the high variability of the envelope. The associations of the whole HIV-1 envelope genetic features (subtype, mutations, insertions-deletions, physicochemical properties) and clinical markers (viral RNA load, CD8(+), CD4(+) T cell counts) with viral tropism were investigated, using a set of 2896 (659 after filter, 593 patients) sequence-tropism pairs available at the Los Alamos HIV database. Bootstrapped hierarchical clustering was used to assess mutational covariation. Univariate and multivariate analysis was performed to assess the relative importance of different features. Different machine learning (logistic regression, support vector machines, decision trees, rule bases, instance based reasoning) and feature selection (filter and embedded) methods, along with loss functions (accuracy, AUC of ROC curves, sensitivity, specificity, f-measure), were applied and compared for the classification of X4 variants. Extra-sample error estimation was assessed via multiple cross-validation and adjustments for multiple testing. A high-performing, compact, and interpretable logistic regression model was derived to infer HIV-1 coreceptor tropism for a given patient [accuracy = 92.76 (SD 3.07); AUC = 0.93 (SD 0.04)].
引用
收藏
页码:305 / 314
页数:10
相关论文
共 59 条
[1]
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]
CC CKRS: A RANTES, MIP-1 alpha, MIP-1 beta receptor as a fusion cofactor for macrophage-tropic HIV-1 [J].
Alkhatib, G ;
Combadiere, C ;
Broder, CC ;
Feng, Y ;
Kennedy, PE ;
Murphy, PM ;
Berger, EA .
SCIENCE, 1996, 272 (5270) :1955-1958
[3]
[Anonymous], R: The R project for statistical computing
[4]
[Anonymous], 1995, Continuous Univariate Distributions
[5]
[Anonymous], THESIS WAIKATO U HAM
[6]
HIV blocked by chemokine antagonist [J].
ArenzanaSeisdedos, F ;
Virelizier, JL ;
Rousset, D ;
ClarkLewis, I ;
Loetscher, P ;
Moser, B ;
Baggiolini, M .
NATURE, 1996, 383 (6599) :400-400
[7]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[8]
A new classification for HIV-1 [J].
Berger, EA ;
Doms, RW ;
Fenyö, EM ;
Korber, BTM ;
Littman, DR ;
Moore, JP ;
Sattentau, QJ ;
Schuitemaker, H ;
Sodroski, J ;
Weiss, RA .
NATURE, 1998, 391 (6664) :240-240
[9]
Bredeek UF, 2007, EUR J MED RES, V12, P427
[10]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32