Modeling reporting delays and reporting corrections in cancer registry data

被引:51
作者
Midthune, DN [1 ]
Fay, MP
Clegg, LX
Feuer, EJ
机构
[1] NCI, Biometry Res Grp, Div Canc Prevent, Bethesda, MD 20892 USA
[2] Natl Inst Allergy & Infect Dis, Biostat Res Branch, Bethesda, MD 20892 USA
[3] Natl Canc Inst, Div Canc Control & Populat Sci, Stat Res & Appl Branch, Bethesda, MD 20892 USA
关键词
cancer surveillance; delay-adjusted rates; incurred but not reported; random effects; Surveillance; Epidemiology; and End Results program; truncated data;
D O I
10.1198/016214504000001899
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute is an authoritative source of cancer incidence statistics in the United States. The SEER program is a consortium of population-based cancer registries from different areas of the country. Each registry is charged with collecting data on all cancers that occur within its geographic area. As with any disease registry, there is a delay between the time that the disease (cancer) is first diagnosed and the time that it is reported to the registry. The SEER program has allowed for reporting delays of up to 19-months before releasing data for public use. Nevertheless, additional cases are discovered after the 19-month delay, and these cases are added in subsequent releases of the data. Further, any errors discovered are corrected in subsequent releases. Such reporting delays and corrections typically lead to underestimation of the cancer incidence rates in recent diagnosis years, making it difficult to monitor trends. In this article we study models that account for reporting delays and corrections in predicting eventual cancer counts for a diagnosis year from the preliminary counts. Previous models of this type have been studied, especially as applied to AIDS registries. We offer several additions to existing models. First, we explicitly model the reporting corrections. Second, we model the delay distribution with very general models, combining aspects of previous nonparametric-like models (i.e., models that have a separate parameter for each delay time) with more parametric models. Third, we allow random reporting-year effects in the model. Practical issues of model selection and how the data are classified are also discussed, particularly how the definition of a reporting correction may change depending on how subpopulations are defined. An example with SEER melanoma data is studied in detail.
引用
收藏
页码:61 / 70
页数:10
相关论文
共 27 条
[1]  
[Anonymous], 1995, Journal of computational and Graphical Statistics, DOI [10.2307/1390625, DOI 10.2307/1390625]
[2]   STATISTICAL-METHODS FOR SHORT-TERM PROJECTIONS OF AIDS INCIDENCE [J].
BROOKMEYER, R ;
DAMIANO, A .
STATISTICS IN MEDICINE, 1989, 8 (01) :23-34
[3]  
Burnham K. P., 2002, MODEL SELECTION MULT
[4]  
Clegg LX, 2002, J NATL CANCER I, V94, P1537
[5]   A PROCESS OF EVENTS WITH NOTIFICATION DELAY AND THE FORECASTING OF AIDS [J].
COX, DR ;
MEDLEY, GF .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES B-BIOLOGICAL SCIENCES, 1989, 325 (1226) :135-145
[6]   UMVUE of the IBNR reserve in a lognormal linear regression model [J].
Doray, LG .
INSURANCE MATHEMATICS & ECONOMICS, 1996, 18 (01) :43-57
[7]   REPRESENTATIVENESS OF THE SURVEILLANCE, EPIDEMIOLOGY, AND END RESULTS PROGRAM DATA - RECENT TRENDS IN CANCER MORTALITY-RATES [J].
FREY, CM ;
MCMILLEN, MM ;
COWAN, CD ;
HORM, JW ;
KESSLER, LG .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1992, 84 (11) :872-877
[8]  
Fritz A, 2001, J REGISTRY MANAGEMEN, V28, P35
[9]   REPORTING DELAYS AND THE INCIDENCE OF AIDS [J].
HARRIS, JE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (412) :915-924
[10]   An application of randomly truncated data models in reserving IBNR claims [J].
Herbst, T .
INSURANCE MATHEMATICS & ECONOMICS, 1999, 25 (02) :123-131