Studies of referees' assessments of manuscripts submitted to scientific journals assume that a "merit" or "publishability" dimension underlies referees' assessments. To measure the level of agreement between referees' recommendations, researchers have used coefficients that require the assignment of arbitrary scores to recommendation categories ("accept", "revise and resubmit", "reject", etc.) or to distances between categories. Using data on referee evaluations of manuscripts submitted to five journals, we show how an extension of recently developed methods for analyzing crosstabulations with ordered categories allows researchers (1) to test the assumption that a single dimension underlies referees' assessments and (2) to derive scale values for the recommendation categories. For four of the five journals, our results are consistent with the hypothesis that a latent publishability dimension underlies referees' assessments. The results also show that the greatest distance between adjacent recommendation categories is between the lowest and second lowest categories, suggesting that recommendations that a paper be rejected are more reliable than more favorable recommendations. We show how these results can be used in attempts to measure the level of agreement between referees' assessments for a given scientific journal. Our results also point to analytic difficulties that confront researchers who wish to compare levels of referee agreement exhibited by different journals. © 1990.