Purpose Computerized databases can be an efficient resource to study the epidemiology of peptic ulcer (PU) and upper gastrointestinal complications (UGIC) if we achieve a high positive predictive value (PPV) of outcome definitions. We assessed the PPV of diagnosis codes in THIN, a primary-care medical-record database, to ascertain individuals with uncomplicated PU, and to identify UGIC and Helicobacter pylori infection status (HPIS) among these patients. Methods We identified: (1) patients with codes suggesting a first episode of uncomplicated PU; (2) episodes of UGIC among them. The computerized profiles with free-text comments of these individuals were reviewed and classified as definite, possible, or excluded cases. Dates and HPIS were also ascertained. For a sample of definite and possible PU, and for all UGIC cases, primary care physicians were sent a questionnaire for confirmation. Results The 5296 individuals with codes suggesting PU were classified as definite (49%), possible (25%), and excluded (26%) cases. The PPV for definite/possible PU was 94% (99% for definite, 84% for possible cases). Of the questionnaires with information on HPIS (62%), the PPV and NPV were 100%. The 97 individuals with codes suggesting UGIC were classified as definite (48%), possible (27%), and excluded (22%) cases; the PPV for definite/possible was 95% (100% for definite, 88% for possible cases). Code dates were generally later than medical-record dates. Conclusion The identification of PU cases and their HPIS and UGIC requires careful review of the computerized clinical information with free-text comments. The validation of a sample is needed to confirm the accuracy of the diagnoses. Copyright (C) 2009 John Wiley & Sons, Ltd.