The high-dimension, low-sample-size geometric representation holds under mild conditions

被引:100
作者
Ahn, Jeongyoun [1 ]
Marron, J. S.
Muller, Keith M.
Chi, Yueh-Yun
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
[2] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
[3] Univ Florida, Dept Epidemiol & Hlth Policy Res, Gainesville, FL 32610 USA
[4] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
关键词
high-dimension; low-sample-size; iarge p small n; linear discrimination; sample covariance matrix;
D O I
10.1093/biomet/asm050
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-dimension, low-small-sample size datasets have different geometrical properties from those of traditional low-dimensional data. In their asymptotic study regarding increasing dimensionality with a fixed sample size, Hall et al. ( 2005) showed that each data vector is approximately located on the vertices of a regular simplex in a high-dimensional space. A perhaps unappealing aspect of their result is the underlying assumption which requires the variables, viewed as a time series, to be almost independent. We establish an equivalent geometric representation under much milder conditions using asymptotic properties of sample covariance matrices. We discuss implications of the results, such as the use of principal component analysis in a high-dimensional space, extension to the case of nonindependent samples and also the binary classification problem.
引用
收藏
页码:760 / 766
页数:7
相关论文
共 12 条
  • [1] Bai ZD, 1998, ANN PROBAB, V26, P316
  • [2] Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices
    Baik, J
    Ben Arous, G
    Péché, S
    [J]. ANNALS OF PROBABILITY, 2005, 33 (05) : 1643 - 1697
  • [3] Eigenvalues of large sample covariance matrices of spiked population models
    Baik, Jinho
    Silverstein, Jack W.
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (06) : 1382 - 1408
  • [4] Adjustment of systematic microarray data biases
    Benito, M
    Parker, J
    Du, Q
    Wu, JY
    Xang, D
    Perou, CM
    Marron, JS
    [J]. BIOINFORMATICS, 2004, 20 (01) : 105 - 114
  • [5] Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations
    Bickel, PJ
    Levina, E
    [J]. BERNOULLI, 2004, 10 (06) : 989 - 1010
  • [6] Cristianini N., 2000, Intelligent Data Analysis: An Introduction
  • [7] Neighborliness of randomly projected simplices in high dimensions
    Donoho, DL
    Tanner, J
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (27) : 9452 - 9457
  • [8] Support vector machine classification and validation of cancer tissue samples using microarray expression data
    Furey, TS
    Cristianini, N
    Duffy, N
    Bednarski, DW
    Schummer, M
    Haussler, D
    [J]. BIOINFORMATICS, 2000, 16 (10) : 906 - 914
  • [9] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [10] Geometric representation of high dimension, low sample size data
    Hall, P
    Marron, JS
    Neeman, A
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 : 427 - 444