A criticism of language-generalization tests has been that their widespread use would increase Type 2 errors. This article examines the mathematics and shows that such tests provide information about Type 2 as well as Type 1 errors and that they should therefore decrease, not increase, the prevalence of Type 2 errors. The point is illustrated by two sets of experiments that provide positive evidence for extra-experimental interference. It is argued that past failures to find such evidence were due in part to sampling variance from treatment-language interactions that were offsetting the treatment effect. It is suggested that the sensible editorial policy for handling non-random language samples is to report three Fs: FSubjects, FLanguage, and Quasi or Min Quasi F. © 1979 Academic Press, Inc.