Challenges in projecting clustering results across gene expression-profiling datasets

被引:80
作者
Lusa, Lara
McShane, Lisa M.
Reid, James F.
De Cecco, Loris
Ambrogi, Federico
Biganzoli, Elia
Gariboldi, Manuela
Pierotti, Marco A.
机构
[1] IFOM Fdn Inst, FIRC Oncol Mol, Mol Genet Canc Grp, I-20139 Milan, Italy
[2] IRCCS, Inst Nazl Tumori, Dept Expt Oncol, Milan, Italy
[3] IRCCS, Inst Nazl Tumori, Unit Med Stat & Biometry, Milan, Italy
[4] NCI, Biometr Res Branch, Bethesda, MD 20892 USA
[5] Univ Milan, Inst Med Stat & Biometry, Milan, Italy
来源
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE | 2007年 / 99卷 / 22期
关键词
D O I
10.1093/jnci/djm216
中图分类号
R73 [肿瘤学];
学科分类号
100214 [肿瘤学];
摘要
Background Gene expression microarray studies for several types of cancer have been reported to identify previously unknown subtypes of tumors. For breast cancer, a molecular classification consisting of five subtypes based on gene expression microarray data has been proposed. These subtypes have been reported to exist across several breast cancer microarray studies, and they have demonstrated some association with clinical outcome. A classification rule based on the method of centroids has been proposed for identifying the subtypes in new collections of breast cancer samples; the method is based on the similarity of the new profiles to the mean expression profile of the previously identified subtypes. Methods Previously identified centroids of five breast cancer subtypes were used to assign 99 breast cancer samples, including a subset of 65 estrogen receptor-positive (ER+) samples, to five breast cancer subtypes based on microarray data for the samples. The effect of mean centering the genes (i. e., transforming the expression of each gene so that its mean expression is equal to 0) on subtype assignment by method of centroids was assessed. Further studies of the effect of mean centering and of class prevalence in the test set on the accuracy of method of centroids classifications of ER status were carried out using training and test sets for which ER status had been independently determined by ligand-binding assay and for which the proportion of ER+ and ER-samples were systematically varied. Results When all 99 samples were considered, mean centering before application of the method of centroids appeared to be helpful for correctly assigning samples to subtypes, as evidenced by the expression of genes that had previously been used as markers to identify the subtypes. However, when only the 65 ER+ samples were considered for classification, many samples appeared to be misclassified, as evidenced by an unexpected distribution of ER+ samples among the resultant subtypes. When genes were mean centered before classification of samples for ER status, the accuracy of the ER subgroup assignments was highly dependent on the proportion of ER+ samples in the test set; this effect of subtype prevalence was not seen when gene expression data were not mean centered. Conclusions Simple corrections such as mean centering of genes aimed at microarray platform or batch effect correction can have undesirable consequences because patient population effects can easily be confused with these assay-related effects. Careful thought should be given to the comparability of the patient populations before attempting to force data comparability for purposes of assigning subtypes to independent subjects.
引用
收藏
页码:1715 / 1723
页数:9
相关论文
共 44 条
[1]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]
Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer [J].
Ayers, M ;
Symmans, WF ;
Stec, J ;
Damokosh, AI ;
Clark, E ;
Hess, K ;
Lecocke, M ;
Metivier, J ;
Booser, D ;
Ibrahim, N ;
Valero, V ;
Royce, M ;
Arun, B ;
Whitman, G ;
Ross, J ;
Sneige, N ;
Hortobagyi, GN ;
Pusztai, L .
JOURNAL OF CLINICAL ONCOLOGY, 2004, 22 (12) :2284-2293
[3]
Gene expression profiling shows medullary breast cancer is a subgroup of basal breast cancers [J].
Bertucci, Francois ;
Finetti, Pascal ;
Cervera, Nathalie ;
Charafe-Jauffret, Emmanuelle ;
Mamessier, Emilie ;
Adelaide, Jose ;
Debono, Stephane ;
Houvenaeghel, Gilles ;
Maraninchi, Dominique ;
Viens, Patrice ;
Charpin, Colette ;
Jacquemier, Jocelyne ;
Birnbaum, Daniel .
CANCER RESEARCH, 2006, 66 (09) :4636-4644
[4]
GENETIC ALTERATIONS IN BREAST-CANCER [J].
BIECHE, I ;
LIDEREAU, R .
GENES CHROMOSOMES & CANCER, 1995, 14 (04) :227-251
[5]
Molecular classification of cutaneous malignant melanoma by gene expression profiling [J].
Bittner, M ;
Meitzer, P ;
Chen, Y ;
Jiang, Y ;
Seftor, E ;
Hendrix, M ;
Radmacher, M ;
Simon, R ;
Yakhini, Z ;
Ben-Dor, A ;
Sampas, N ;
Dougherty, E ;
Wang, E ;
Marincola, F ;
Gooden, C ;
Lueders, J ;
Glatfelter, A ;
Pollock, P ;
Carpten, J ;
Gillanders, E ;
Leja, D ;
Dietrich, K ;
Beaudry, C ;
Berens, M ;
Alberts, D ;
Sondak, V ;
Hayward, N ;
Trent, J .
NATURE, 2000, 406 (6795) :536-540
[6]
Molecular classification and molecular forecasting of breast cancer: Ready for clinical application? [J].
Brenton, JD ;
Carey, LA ;
Ahmed, AA ;
Caldas, C .
JOURNAL OF CLINICAL ONCOLOGY, 2005, 23 (29) :7350-7360
[7]
Intrinsic molecular signature of breast cancer in a population-based cohort of 412 patients [J].
Calza, Stefano ;
Hall, Per ;
Auer, Gert ;
Bjohle, Judith ;
Klaar, Sigrid ;
Kronenwett, Ulrike ;
T Liu, Edison ;
Miller, Lance ;
Ploner, Alexander ;
Smeds, Johanna ;
Bergh, Jonas ;
Pawitan, Yudi .
BREAST CANCER RESEARCH, 2006, 8 (04)
[8]
Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival [J].
Chang, HY ;
Nuyten, DSA ;
Sneddon, JB ;
Hastie, T ;
Tibshirani, R ;
Sorlie, T ;
Dai, HY ;
He, YDD ;
van't Veer, LJ ;
Bartelink, H ;
van de Rijn, M ;
Brown, PO ;
van de Vijver, MJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (10) :3738-3743
[9]
Charafe-Jauffret E, 2005, INT J ONCOL, V27, P1307
[10]
Concordance among gene-expression-based predictors for breast cancer [J].
Fan, Cheng ;
Oh, Daniel S. ;
Wessels, Lodewyk ;
Weigelt, Britta ;
Nuyten, Dimitry S. A. ;
Nobel, Andrew B. ;
van't Veer, Laura J. ;
Perou, Charles M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2006, 355 (06) :560-569