HCV genotyping using statistical classification approach

被引:12
作者
Qiu, Ping [1 ]
Cai, Xiao-Yan [2 ]
Ding, Wei [1 ]
Zhang, Qing [1 ]
Norris, Ellie D. [1 ]
Greene, Jonathan R. [1 ]
机构
[1] Schering Plough Res Inst, Kenilworth, NJ 07033 USA
[2] Schering Plough Res Inst, Union, NJ 07083 USA
关键词
HEPATITIS-C VIRUS; INTERFERON-ALPHA-2B PLUS RIBAVIRIN; 5 NONCODING REGION; SEQUENCE-ANALYSIS; WEIGHT MATRIX; PREDICTION; COMBINATION; NS5B;
D O I
10.1186/1423-0127-16-62
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
The genotype of Hepatitis C Virus (HCV) strains is an important determinant of the severity and aggressiveness of liver infection as well as patient response to antiviral therapy. Fast and accurate determination of viral genotype could provide direction in the clinical management of patients with chronic HCV infections. Using publicly available HCV nucleotide sequences, we built a global Position Weight Matrix (PWM) for the HCV genome. Based on the PWM, a set of genotype specific nucleotide sequence "signatures" were selected from the 5' NCR, CORE, E1, and NS5B regions of the HCV genome. We evaluated the predictive power of these signatures for predicting the most common HCV genotypes and subtypes. We observed that nucleotide sequence signatures selected from NS5B and E1 regions generally demonstrated stronger discriminant power in differentiating major HCV genotypes and subtypes than that from 5' NCR and CORE regions. Two discriminant methods were used to build predictive models. Through 10 fold cross validation, over 99% prediction accuracy was achieved using both support vector machine (SVM) and random forest based classification methods in a dataset of 1134 sequences for NS5B and 947 sequences for E1. Prediction accuracy for each genotype is also reported.
引用
收藏
页数:9
相关论文
共 42 条
[1]   Rapid genotyping of hepatitis C virus by primer-specific extension analysis [J].
Antonishyn, NA ;
Ast, VM ;
McDonald, RR ;
Chaudhary, RK ;
Lin, L ;
Andonov, AP ;
Horsman, GB .
JOURNAL OF CLINICAL MICROBIOLOGY, 2005, 43 (10) :5158-5163
[2]   GenBank [J].
Benson, Dennis A. ;
Karsch-Mizrachi, Ilene ;
Lipman, David J. ;
Ostell, James ;
Wheeler, David L. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D16-D20
[3]   Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   SEQUENCE-ANALYSIS OF THE 5' NONCODING REGION OF HEPATITIS-C VIRUS [J].
BUKH, J ;
PURCELL, RH ;
MILLER, RH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (11) :4942-4946
[7]  
Busuttil Steven, 2004, Genome Inform, V15, P191
[8]   Hepatitis C virus genotyping:: Interrogation of the 5′ untranslated region cannot accurately distinguish genotypes 1a and 1b [J].
Chen, ZY ;
Weck, KE .
JOURNAL OF CLINICAL MICROBIOLOGY, 2002, 40 (09) :3127-3134
[9]   ISOLATION OF A CDNA CLONE DERIVED FROM A BLOOD-BORNE NON-A, NON-B VIRAL-HEPATITIS GENOME [J].
CHOO, QL ;
KUO, G ;
WEINER, AJ ;
OVERBY, LR ;
BRADLEY, DW ;
HOUGHTON, M .
SCIENCE, 1989, 244 (4902) :359-362
[10]  
Combet Christophe, 2004, Appl Bioinformatics, V3, P237, DOI 10.2165/00822942-200403040-00005