Constrained linear regression models for symbolic interval-valued variablesk

被引:163
作者
Lima Neto, Eufrasio de A. [2 ]
de Carvalho, Francisco de A. T. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50740540 Recife, PE, Brazil
[2] Univ Fed Paraiba, Dept Estat, BR-58051900 Joao Pessoa, Paraiba, Brazil
关键词
INEQUALITY RESTRICTIONS;
D O I
10.1016/j.csda.2009.08.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces an approach to fitting a constrained linear regression model to interval-valued data. Each example of the learning set is described by a feature vector for which each feature value is an interval. The new approach fits a constrained linear regression model on the midpoints and range of the interval values assumed by the variables in the learning set. The prediction of the lower and upper boundaries of the interval value of the dependent variable is accomplished from its midpoint and range, which are estimated from the fitted linear regression models applied to the midpoint and range of each interval value of the independent variables. This new method shows the importance of range information in prediction performance as well as the use of inequality constraints to ensure mathematical coherence between the predicted values of the lower ((y) over cap (Li)) and upper ((y) over cap (Ui)) boundaries of the interval. The authors also propose an expression for the goodness-of-fit measure denominated determination coefficient. The assessment of the proposed prediction method is based on the estimation of the average behavior of the root-mean-square error and square of the correlation coefficient in the framework of a Monte Carlo experiment with different data set configurations. Among other aspects, the synthetic data sets take into account the dependence, or lack thereof, between the midpoint and range of the intervals. The bias produced by the use of inequality constraints over the vector of parameters is also examined in terms of the mean-square error of the parameter estimates. Finally, the approaches proposed in this paper are applied to a real data set and performances are compared. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:333 / 347
页数:15
相关论文
共 35 条
[1]  
[Anonymous], ANAL SYMBOLIC DATA
[2]  
[Anonymous], 2000, ANAL SYMBOLIC DATA E, DOI DOI 10.1007/978-3-642-57155-8
[3]  
[Anonymous], 1997, Revue Statistique appliquee
[4]  
[Anonymous], 1974, Solving least squares problems
[5]  
[Anonymous], 2002, Classification, Clustering, and Data Analysis
[6]  
[Anonymous], 1966, Applied regression analysis
[7]  
[Anonymous], 1999, The analysis of variance
[8]  
BERTRAND P, ANAL SYMBOLIC DATA, P106
[9]  
Billard, 2007, SYMBOLIC PRINCIPAL C
[10]   From the statistics of data to the statistics of knowledge: Symbolic data analysis [J].
Billard, L ;
Diday, E .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :470-487