Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum

被引:98
作者
Lilien, RH
Farid, H
Donald, BR [1 ]
机构
[1] Dartmouth Comp Sci Dept, Sudikoff Lab 6211, Hanover, NH 03755 USA
[2] Dartmouth Coll Sch Med, Hanover, NH 03755 USA
[3] Dartmouth Chem Dept, Hanover, NH 03755 USA
[4] Dartmouth Dept Biol Sci, Hanover, NH 03755 USA
关键词
mass spectrometry; proteomics; disease diagnosis; cancer diagnosis; disease classification; human serum; linear discriminant analysis; machine learning; support vector machines; complex protein mixtures; exact algorithms; biomarkers; differential expression; mass spectrometry classification algorithms; probabilistic classification;
D O I
10.1089/106652703322756159
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.
引用
收藏
页码:925 / 946
页数:22
相关论文
共 37 条
[1]
Adam BL, 2002, CANCER RES, V62, P3609
[2]
Differentiation of betamethasone and dexamethasone using liquid chromatography/positive electrospray tandem mass spectrometry and multivariate statistical analysis [J].
Antignac, JP ;
Le Bizec, B ;
Monteau, F ;
Andre, F .
JOURNAL OF MASS SPECTROMETRY, 2002, 37 (01) :69-75
[3]
Austen BM, 2000, J PEPT SCI, V6, P459, DOI 10.1002/1099-1387(200009)6:9<459::AID-PSC286>3.0.CO
[4]
2-B
[5]
The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]
An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers [J].
Ball, G ;
Mian, S ;
Holding, F ;
Allibone, RO ;
Lowe, J ;
Ali, S ;
Li, G ;
McCardle, S ;
Ellis, IO ;
Creaser, C ;
Rees, RC .
BIOINFORMATICS, 2002, 18 (03) :395-404
[7]
COMPARISON OF DIGITAL RECTAL EXAMINATION AND SERUM PROSTATE-SPECIFIC ANTIGEN IN THE EARLY DETECTION OF PROSTATE-CANCER - RESULTS OF A MULTICENTER CLINICAL-TRIAL OF 6,630 MEN [J].
CATALONA, WJ ;
RICHIE, JP ;
AHMANN, FR ;
HUDSON, MA ;
SCARDINO, PT ;
FLANIGAN, RC ;
DEKERNION, JB ;
RATLIFF, TL ;
KAVOUSSI, LR ;
DALKIN, BL ;
WATERS, WB ;
MACFARLANE, MT ;
SOUTHWICK, PC .
JOURNAL OF UROLOGY, 1994, 151 (05) :1283-1290
[8]
De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[9]
The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[10]
*GNU, 2002, GNU GEN PUBL LIC