Structure-based virtual screening is carried out using molecular docking programs. A number of such docking programs are currently available, and the selection of docking program is difficult without knowing the characteristics or performance of each program. In this study, the screening performances of three molecular docking programs, DOCK, AutoDock, and GOLD, were evaluated with 116 target proteins. The screening performances were validated using two novel standards, along with a traditional enrichment rate measurement. For the evaluations, each docking run was repeated 1000 times with three initial conformations of a ligand. While each docking program has some merit over the other docking programs in some aspects, DOCK showed an unexpectedly better screening performance in the enrichment rates. Finally, we made several recommendations based on the evaluation results to enhance the screening performances of the docking programs.