The evaluator effect: A chilling fact about usability evaluation methods

被引：143

作者：

Hertzum, M ^{[1
]}

Jacobsen, NE ^{[1
]}

机构：

[1] Riso Natl Lab, Ctr Human Machine Interact, Syst Anal Dept, DK-4000 Roskilde, Denmark

来源：

INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION | 2001年 / 13卷 / 04期

关键词：

D O I：

10.1207/S15327590IJHC1304_05

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking-aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no I of the 3 UEMs is consistently better than the others. Although evaluator effects of this magnitude may not be surprising for a UEM as informal as HE, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of CW or observe users thinking out loud. Hence, it is highly questionable to use a TA with 1 evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMS is characterized by (a) vague goal analyses leading to variability in the task scenarios, (b) vague evaluation procedures leading to anchoring, or (c) vague problem criteria leading to anything being accepted as a usability problem, or all of these. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

引用

页码：421 / 443

页数：23

共 36 条

[1] [Anonymous], P SIGCHI C HUM FACT, DOI [DOI 10.1145/97243.97266, DOI 10.1145/97243.97281]
[2] Connell IW, 1999, HUMAN-COMPUTER INTERACTION - INTERACT '99, P621
[3] Interobserver variability on the histopathologic diagnosis of cutaneous melanoma and other pigmented skin lesions
Corona, R
Mele, A
Amini, M
DeRosa, G
Coppola, G
Piccardi, P
Fucci, M
Pasquini, P
Faraggiana, T
[J]. JOURNAL OF CLINICAL ONCOLOGY, 1996, 14 (04) : 1218 - 1223
[4] Interobserver variability in dermatopathology
Cramer, SF
[J]. ARCHIVES OF DERMATOLOGY, 1997, 133 (08) : 1033 - 1036
[5] Dumas J.F., 1993, A practical guide to usability testing
[6] DUTT A, 1994, PEOPLE COMPUTERS, V9, P109
[7] FUNK ME, 1983, B MED LIBR ASSOC, V71, P176
[8] Damaged merchandise? A review of experiments that compare usability evaluation methods
Gray, WD
Salzman, MC
[J]. HUMAN-COMPUTER INTERACTION, 1998, 13 (03): : 203 - 261
[9] Jacobsen N., 2000, CMUCS00132
[10] Jacobsen NE, 1998, HUM FAC ERG SOC P, P1336

← 1 2 3 4 →