Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

被引:43
作者
Beggrow, Elizabeth P. [1 ]
Ha, Minsu [1 ]
Nehm, Ross H. [2 ]
Pearl, Dennis [3 ]
Boone, William J. [4 ]
机构
[1] Ohio State Univ, Dept Teaching & Learning, Columbus, OH 43210 USA
[2] SUNY Stony Brook, Dept Ecol & Evolut, Ctr Sci & Math Educ, Stony Brook, NY 11794 USA
[3] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[4] Miami Univ, Dept Educ Psychol, Oxford, OH 45056 USA
基金
美国国家科学基金会;
关键词
Applications in subject areas; Evaluation methodologies; Improving classroom teaching; Pedagogical issues; Teaching/learning strategies; NATURAL-SELECTION; CONCEPTUAL INVENTORY; EXPLANATION; EVOLUTION; KNOWLEDGE; CONSTRUCTION; ARGUMENT; BIOLOGY; MODELS;
D O I
10.1007/s10956-013-9461-9
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices-such as explanation, argumentation, and communication-in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit > 1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.
引用
收藏
页码:160 / 182
页数:23
相关论文
共 62 条
[1]   MACHINES THAT THINK FOR THEMSELVES [J].
Abu-Mostafa, Yaser S. .
SCIENTIFIC AMERICAN, 2012, 307 (01) :78-81
[2]   Development and evaluation of the conceptual inventory of natural selection [J].
Anderson, DL ;
Fisher, KM ;
Norman, GJ .
JOURNAL OF RESEARCH IN SCIENCE TEACHING, 2002, 39 (10) :952-978
[3]  
[Anonymous], 2012, FRAM K 12 SCI ED PRA
[4]  
[Anonymous], 2007, TAK SCI SCH
[5]  
[Anonymous], 2012, J RES SCI TEACH, DOI DOI 10.1002/tea.20454
[6]  
[Anonymous], 2001, KNOWING WHAT STUDENT
[7]  
[Anonymous], 1996, NAT SCI ED STAND
[8]   USING ITEM RESPONSE THEORY TO CONDUCT A DISTRACTER ANALYSIS ON CONCEPTUAL INVENTORY OF NATURAL SELECTION [J].
Battisti, Bryce Thomas ;
Hanegan, Nikki ;
Sudweeks, Richard ;
Cates, Rex .
INTERNATIONAL JOURNAL OF SCIENCE AND MATHEMATICS EDUCATION, 2010, 8 (05) :845-868
[9]  
Bauerle C., 2011, Vision and Change in Undergraduate Biology Education. A call to action
[10]  
Beggrow E., 2012, EVOL EDUC OUTREACH, V5, P429, DOI DOI 10.1007/s12052-012-0432-z