We developed a numerical model for evaluation of graphical user interface (GUI) screens. The model consists of design guidelines concerning screen factors-element size, local density, alignment, and grouping-and produces a complexity score for a given screen. The complexity predictions of the model were examined in a fully factorial experimental design in which GUI screens with all combinations of factors were shown to human users. We measured participants' search times for given elements on all screens, and participants rated their pair-wise preferences of those screens. Overall, very well designed screens resulted in shorter search times and high subjective preference. The combination of poor alignment and poor local density had the strongest adverse effect on search time. Alignment and grouping were found to have more influence on subjective preference. Weights derived from the subjective judgments were introduced into the model, and a significant correlation was found between model predictions and search times. We discuss the findings in terms of screen-design implications and in terms of the development and use of numerical models in GUI design and evaluation.