10 studies comparing periodontal treatment modalities were re-examined to see if they had adequate power to detect true differences. Attachment level (AL) and pocket depth (PD) were the 2 variables assessed. A statistical test's power refers to its probability of detecting a significant sample difference in treatment means, given a predetermined value for alpha (level of significance), DELTA (a clinically meaningful underlying difference), and the sample size. Studies were included that stratified their data by initial pocket depths, reported sample size, and lasted at least 6 months. Power calculations were done for 173 treatment comparisons, using DELTA = 0.5 mm and alpha = 0.05. For shallow pockets (1-3 mm), most studies had a strong chance of detecting true differences (median power = 83%). For moderate pockets (4-6 mm), median power dropped to 38%. However, median power dropped to 14% for deep pockets (> 6 mm), with 75% of the tests having less than a 20% chance of detecting a 0.5 mm difference, Many of the modalities reported as "not significantly different" from each other have not had a fair trial, especially for deep pockets. In order to improve a study's power, 4 factors are discussed: the number of compared treatments, the expected noise or random error, the patient sample size, and the average number of sites per patient for each pocket depth category.