A comparison of two scoring methods for an automated speech scoring system

被引:22
作者
Xi, Xiaoming [1 ]
Higgins, Derrick [1 ]
Zechner, Klaus [1 ]
Williamson, David [1 ]
机构
[1] Educ Testing Serv, Ctr Valid Res, Res & Dev, Princeton, NJ 08541 USA
关键词
automated speech scoring; validity; classification trees; TOEFL; practice test; NETWORKS; SCORES;
D O I
10.1177/0265532211425673
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper compares two alternative scoring methods - multiple regression and classification trees - for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models is reported in Zechner, Higgins, Xi, & Williamson (2009), which discusses the development of the entire automated speech scoring system; the current paper shifts the focus to the comparison of the two scoring methods, elaborating both technical and substantive considerations and providing a reasoned argument for the trade-off between them. We concluded that a multiple regression model with expert weights was superior to the classification tree model. In addition to comparing the relative performance of the two models, we also evaluated the adequacy of the regression model for the intended use. In particular, the construct representation of the model was sufficiently broad to justify its use in a low-stakes application. The correlation of the model-predicted total test scores with human scores (r = 0.7) was also deemed acceptable for practice purposes.
引用
收藏
页码:371 / 394
页数:24
相关论文
共 33 条
[1]  
[Anonymous], 2002, J. Technol. Learn. Assess
[2]  
[Anonymous], 2006, J TECHNOLOGY LEARNIN
[3]  
[Anonymous], 1984, WADSWORTH INC
[4]  
[Anonymous], 1979, New Developments
[5]  
Bennett R.E., 1998, Educational Measurement: Issues and Practice, V17, P9, DOI DOI 10.1111/J.1745-3992.1998.TB00631.X
[6]  
Bernstein J., 1989, J ACOUSTIC SOC AM S1, pS77
[7]  
BERNSTEIN J, 1990, P INT C SPOK LANG PR, P1185
[8]  
Bernstein J., 1999, PHONEPASS TESTING ST
[9]  
Bernstein J, 2008, ROUT STUD COMP ASSIS, V4, P174
[10]  
Braun H., 2006, Automated scoring for complex constructed response tasks in computer based testing, P83