From the statistics of data to the statistics of knowledge: Symbolic data analysis

被引:223
作者
Billard, L [1 ]
Diday, E
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
[2] Univ Paris 09, CEREMADE, F-75775 Paris 16, France
关键词
clustering; concepts; descriptive statistics; principal components; symbolic data;
D O I
10.1198/016214503000242
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Increasingly, datasets are so large they must be summarized in some fashion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that the data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. This article looks at the concept of symbolic data in general, and then attempts to review the methods currently available to analyze such data. It quickly becomes clear that the range of methodologies available draws analogies with developments before 1900 that formed a foundation for the inferential statistics of the 1900s, methods largely limited to small (by comparison) datasets and classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalog (so to speak) of new symbolic methodologies along with rigorous mathematical and statistical foundational work for these methods.
引用
收藏
页码:470 / 487
页数:18
相关论文
共 59 条
[1]  
Anderson T.W., 1986, STAT ANAL DATA, V2nd
[2]  
[Anonymous], 1994, NEW APPROACHES CLASS
[3]  
Bandemer H., 1992, Fuzzy data analysis
[4]  
Bertrand P, 2000, ST CLASS DAT ANAL, P106
[5]  
BERTRAND P, 1995, PARTITIONING DATA SE, P352
[6]  
Billard L, 2000, ST CLASS DAT ANAL, P369
[7]  
BILLARD L, 2002, SYMBOLIC DATA ANAL D
[8]  
Billard L., 2002, CLASSIFICATION CLUST, P281, DOI [DOI 10.1007/978-3-642-56181-8_31, 10.1007/978-3-642-56181-8_31]
[9]  
Bock H.H., 2000, ANAL SYMBOLIC DATA E, DOI DOI 10.1007/978-3-642-57155-8
[10]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946