Transfer-Learning Methods in Programming Course Outcome Prediction

被引:19
作者
Lagus, Jarkko [1 ]
Longi, Krista [1 ]
Klami, Arto [1 ]
Hellas, Arto [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, POB 68, FI-00014 Helsinki, Finland
来源
ACM TRANSACTIONS ON COMPUTING EDUCATION | 2018年 / 18卷 / 04期
基金
芬兰科学院;
关键词
Machine learning; transfer learning; source code snapshots; introductory programming; educational data mining; learning analytics; novice programmers; course outcome prediction; STUDENTS;
D O I
10.1145/3152714
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The computing education research literature contains a wide variety of methods that can be used to identify students who are either at risk of failing their studies or who could benefit from additional challenges. Many of these are based on machine-learning models that learn to make predictions based on previously observed data. However, in educational contexts, differences between courses set huge challenges for the generalizability of these methods. For example, traditional machine-learning methods assume identical distribution in all data- in our terms, traditional machine-learning methods assume that all teaching contexts are alike. In practice, data collected from different courses can be very different as a variety of factors may change, including grading, materials, teaching approach, and the students. Transfer-learning methodologies have been created to address this challenge. They relax the strict assumption of identical distribution for training and test data. Some similarity between the contexts is still needed for efficient learning. In this work, we review the concept of transfer learning especially for the purpose of predicting the outcome of an introductory programming course and contrast the results with those from traditional machine-learning methods. The methods are evaluated using data collected in situ from two separate introductory programming courses. We empirically show that transfer-learning methods are able to improve the predictions, especially in cases with limited amount of training data, for example, when making early predictions for a new context. The difference in predictive power is, however, rather subtle, and traditional machine-learning models can be sufficiently accurate assuming the contexts are closely related and the features describing the student activity are carefully chosen to be insensitive to the fine differences.
引用
收藏
页数:18
相关论文
共 42 条
[1]  
[Anonymous], 2012, P 43 ACM TECHN S COM, DOI [10.1145/2157136.2157182, DOI 10.1145/2157136.2157182]
[2]  
[Anonymous], 2015, P 11 ANN INT C INT C, DOI [10.1145/2787622.2787717, DOI 10.1145/2787622.2787717]
[3]  
[Anonymous], 2008, NIPS, DOI DOI 10.5555/2981780.2981825
[4]  
[Anonymous], 2012, Model ensembles,'' inMachine Learning: The Art and Scienceof Algorithms That Make Sense of Data
[5]  
[Anonymous], 2010, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, DOI DOI 10.1109/CVPR.2010.5539857
[6]  
[Anonymous], 2015, Proceedings of the 15th Koli Calling Conference on Computing Education Research, DOI [10.1145/2828959.2828966, DOI 10.1145/2828959.2828966]
[7]  
Becker Brett A., 2016, Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, P296, DOI DOI 10.1145/2899415.2899463
[8]  
Bergin S., 2005, SIGCSE Bulletin, V37, P411, DOI 10.1145/1047124.1047480
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
Carter Adam Scott, 2015, P 11 ANN INT C INT C, P141, DOI DOI 10.1145/2787622.2787710