Plant seed classification using pyrolysis mass spectrometry with unsupervised learning: The application of auto-associative and Kohonen artificial neural networks

被引:20
作者
Goodacre, R
Pygall, J
Kell, DB
机构
[1] Institute of Biological Sciences, University of Wales, Aberystwyth, Dyfed SY23 3DA, Wales
基金
英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
neural networks; auto-associative neural networks; feature extraction; pyrolysis mass spectrometry; seed typing; self organising feature maps;
D O I
10.1016/0169-7439(96)00021-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pyrolysis mass spectrometry (PyMS) was used to gain high dimensional (150 m/z values) biochemical fingerprints from Begonia semperflorens Summer Rainbow, Campanula carpatica White Gem, Lobelia erinus White Fountain, and Lobelia erinus White Lady plant seeds, Rather than homogenizing the seeds and analysing the extracts, the sample preparation of the seeds in this study was novel and merely involved crimping the metal foil sample carrier around the seeds. Compared to extractive procedures the technique exploited in this study will give a fair representation of the seed, is rapid and thus amenable to the analysis of a high volume of samples. To observe the relationship between these seeds, based on their spectral fingerprints, it was necessary to reduce the dimensionality of these data by unsupervised feature extraction methods. The neural computational pattern recognition techniques of self organising feature maps (SOFMs) and auto-associative neural networks were therefore employed and the clusters observed compared with the groups obtained from the more conventional statistical approaches of principal components analysis (PCA) and canonical variates analysis (CVA). When PCA was used to analyze the raw pyrolysis mass spectra replicate samples were not recovered in discrete clusters; CVA, which minimises the within-group variance and maximises the between-group variance, therefore had to be employed. Although B. semperflorens and C. carpatica seeds were recovered separately and away from the L. erinus plant seeds, the two types of L. erinus seeds could still not be discriminated between using this approach. CVA uses a priori information on which spectra are replicates; we therefore encoded this information by employing a novel preprocessing regime where the triplicate mass spectra from each of the seeds were averaged in pairs to produce three new spectra; these were then used by each of the unsupervised methods. PCA still failed to separate the two L. erinus; however, auto-associative neural networks could be used successfully to discriminate them. It is likely that this was due to their ability to perform non-linear mappings and hence approximate non-linear PCA. SOFMs could also be used to separate all four seeds unequivocally. To obtain quantitative information regarding the similarity of these seeds from their pyrolysis mass spectra, SOFMs were trained with different numbers of nodes in the Kohonen output layers. The results observed from this procedure are often difficult to report in tables or visualise using topological contour maps; to simplify the graphical representation of the similarity between the seeds we therefore performed the novel construction of a dendrogram from the various SOFMs analyses. This study demonstrates the potential of PyMS for discriminating plant seeds at the genus, species and sub-species level. Moreover the clusters observed were a true reflection of the known taxonomy of these plants. This approach will be invaluable to the plant taxonomist in representing biological relationships among plant taxa or in describing genomic relationships without the need for cultivation of the propagule.
引用
收藏
页码:69 / 83
页数:15
相关论文
共 51 条
[1]  
[Anonymous], 1988, Multivariate statistics: A practical approach
[2]  
[Anonymous], HDB NEW BACTERIAL SY
[3]  
[Anonymous], 1986, PDP Research Group, Parallel Distributed Processing, Volume
[4]  
BERKELEY RCW, 1990, LAB PRACT, V39, P81
[5]  
CAUSTON DR, 1987, BIOL ADV MATH
[6]  
Chapman J. R., 1993, PRACTICAL ORGANIC MA
[7]  
Chatfield C., 1980, INTRO MULTIVARIATE A
[8]  
Chauvin Y., 1995, BACKPROPAGATION THEO
[9]  
EMONS AMC, 1993, ACTA BOT NEERL, V42, P319
[10]   SELF-ORGANIZING MAPS - ORDERING, CONVERGENCE PROPERTIES AND ENERGY FUNCTIONS [J].
ERWIN, E ;
OBERMAYER, K ;
SCHULTEN, K .
BIOLOGICAL CYBERNETICS, 1992, 67 (01) :47-55