Data from 67 6-rabbit eye irritation tests were used to generate 2-, 3- and 4-rabbit Draize scores. The 15 2-rabbit, 20 3-rabbit and 15 4-rabbit subsample scores for each of the 67 petrochemicals tested were used establish prediction intervals for the original 6-rabbit scores. Prediction interval length shortened with increasing sample size, was widest in the middle portion of the Draize scale and was used to select the minimum number of rabbits necessary to satisfy a required a required level of precision. The ability of each subsample size to correctly classify the test materials according to an in-house irritation classification system was evaluated. Subsamples of size 2, 3, 4 and 5 were 88, 93, 95 and 96% accurate, respectively (compared to 6 rabbits), at correctly classifying the irritation potential of the materials tested.