Genome scanning tests for comparing amino acid sequences between groups

被引:19
作者
Gilbert, Peter B. [1 ,2 ]
Wu, Chunyuan [1 ,2 ]
Jobes, David V. [3 ]
机构
[1] Univ Washington, Fred Hutchinson Canc Res Ctr, Seattle, WA 98109 USA
[2] Univ Washington, Dept Biostat, Seattle, WA 98109 USA
[3] Presidio Pharmaceut, San Francisco, CA 94104 USA
关键词
genetics; high-dimensional data; hypothesis testing; Kullback-Leibler; Mahalanobis; multinomial; sequence analysis; signature position; vaccine trial;
D O I
10.1111/j.1541-0420.2007.00845.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify positions at which the amino acids in infected vaccine recipient sequences either (A) are more divergent from the reference amino acid than the amino acids in infected placebo recipient sequences or (13) have a different frequency distribution than the placebo sequences, irrespective of a reference amino acid. We consider t-test-type statistics for problem A and Euclidean, Mahalanobis, and Kullback-Leibler-type statistics for problem B. The test statistics incorporate weights to reflect biological information contained in different amino acid positions and mismatches. Position-specific p-values are obtained by approximating the null distribution of the statistics either by a permutation procedure or by a nonparametric estimation. A permutation method is used to estimate a cut-off p-value to control the per comparison error rate at a prespecified level. The methods are examined in simulations and are applied to two HIV examples. The methods for problem B address the general problem of comparing discrete frequency distributions between groups in a high-dimensional data setting.
引用
收藏
页码:198 / 207
页数:10
相关论文
共 17 条
[1]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[2]  
EGUCHI S, 2002, INTERPRETING KULLBAC
[3]  
Flynn MN, 2005, J INFECT DIS, V191, P654, DOI 10.1086/428404
[4]   Statistical methods for assessing differential vaccine protection against human immunodeficiency virus types [J].
Gilbert, PB ;
Self, SG ;
Ashby, MA .
BIOMETRICS, 1998, 54 (03) :799-814
[5]  
*HVTN, 2005, PIP PROJ
[6]   SIGNATURE PATTERN-ANALYSIS - A METHOD FOR ASSESSING VIRAL SEQUENCE RELATEDNESS [J].
KORBER, B ;
MYERS, G .
AIDS RESEARCH AND HUMAN RETROVIRUSES, 1992, 8 (09) :1549-1560
[7]   A nonparametric test of gene region heterogeneity associated with phenotype [J].
Kowalski, J ;
Pagano, M ;
DeGruttola, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :398-408
[8]  
KUIKEN C, 2002, HIV SEQUENCE COMPEND
[9]  
NICKLE DC, 2007, PLOS ONE, DOI DOI 10.1371/JOURNAL.PHONE.0000503
[10]   On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression [J].
Pan, W .
BIOINFORMATICS, 2003, 19 (11) :1333-1340