Evaluating testing methods by delivered reliability

被引:98
作者
Frankl, PG
Hamlet, RG
Littlewood, B
Strigini, L
机构
[1] Polytech Univ, CIS Dept, Brooklyn, NY 11201 USA
[2] Oregon Hlth & Sci Univ, Dept Comp Sci, Portland, OR 97201 USA
[3] City Univ London, Ctr Software Reliabil, London EC1V 0HB, England
基金
美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
reliability; debugging; software testing; statistical testing theory;
D O I
10.1109/32.707695
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There are two main goals in testing software: 1) to achieve adequate quality (debug testing); the objective is to probe the software for defects so that these can be removed and 2) to assess existing quality (operational testing); the objective is to gain confidence that the software is reliable. The names are arbitrary, and most testing techniques address both goals to some degree. However, debug methods tend to ignore random selection of test data from an operational profile, while for operational methods this selection is all-important. Debug methods are thought, without any real proof, to be good at uncovering defects so that these can be repaired, but having done so they do not provide a technically defensible assessment of the reliability that results. On the other hand, operational methods provide accurate assessment, but may not be as useful for achieving reliability. This paper examines the relationship between the two testing goals, using a probabilistic analysis. We define simple models of programs and their testing, and try to answer theoretically the question of how to attain program reliability: Is it better to test by probing for defects as in debug testing, or to assess reliability directly as in operational testing, uncovering defects by accident, so to speak? There is no simple answer, of course. Testing methods are compared in a model where program failures are detected and the software changed to eliminate them. The "better" method delivers higher reliability after all test failures have been eliminated. This comparison extends previous work, where the measure was the probability of detecting a failure. Revealing special cases are exhibited in which each kind of testing is superior. Preliminary analysis of the distribution of the delivered reliability indicates that even simple models have unusual statistical properties, suggesting caution in interpreting theoretical comparisons.
引用
收藏
页码:586 / 601
页数:16
相关论文
共 26 条
[1]   SOFTWARE PROCESS EVOLUTION AT THE SEL [J].
BASILI, V ;
GREEN, S .
IEEE SOFTWARE, 1994, 11 (04) :58-66
[2]  
BEIZER B, 1995, P 13 ANN PAC NW SOFT, P148
[3]   THE INFEASIBILITY OF QUANTIFYING THE RELIABILITY OF LIFE-CRITICAL REAL-TIME SOFTWARE [J].
BUTLER, RW ;
FINELLI, GB .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (01) :3-12
[4]   On the expected number of failures detected by subdomain testing and random testing [J].
Chen, TY ;
Yu, YT .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (02) :109-119
[5]   ON THE RELATIONSHIP BETWEEN PARTITION AND RANDOM TESTING [J].
CHEN, TY ;
YU, YT .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (12) :977-980
[6]   ENGINEERING SOFTWARE UNDER STATISTICAL QUALITY-CONTROL [J].
COBB, RH ;
MILLS, HD .
IEEE SOFTWARE, 1990, 7 (06) :44-54
[7]  
DALAL SR, 1993, PROC INT CONF SOFTW, P425, DOI 10.1109/ICSE.1993.346023
[8]   AN EVALUATION OF RANDOM TESTING [J].
DURAN, JW ;
NTAFOS, SC .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1984, 10 (04) :438-444
[9]   AN EXPERIMENTAL COMPARISON OF THE EFFECTIVENESS OF BRANCH TESTING AND DATA-FLOW TESTING [J].
FRANKL, PG ;
WEISS, SN .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (08) :774-787
[10]   PROVABLE IMPROVEMENTS ON BRANCH TESTING [J].
FRANKL, PG ;
WEYUKER, EJ .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (10) :962-975