MULTIPLE IMPUTATION OF INDUSTRY AND OCCUPATION CODES IN CENSUS PUBLIC-USE SAMPLES USING BAYESIAN LOGISTIC-REGRESSION

被引：115

作者：

CLOGG, CC

RUBIN, DB

SCHENKER, N

SCHULTZ, B

WEIDMAN, L

机构：

[1] PENN STATE UNIV, DEPT SOCIOL, UNIVERSITY PK, PA 16802 USA

[2] HARVARD UNIV, DEPT STAT, CAMBRIDGE, MA 02138 USA

[3] UNIV CALIF LOS ANGELES, SCH PUBL HLTH, DEPT BIOSTAT, LOS ANGELES, CA 90024 USA

[4] US EPA, WASHINGTON, DC 20460 USA

[5] US BUR CENSUS, DIV STAT RES, WASHINGTON, DC 20233 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 1991年 / 86卷 / 413期

关键词：

MAXIMUM LIKELIHOOD; MISSING DATA; MULTIPLE IMPUTATION; PRIOR DISTRIBUTION; SPARSE DATA; STRUCTURAL ZEROS;

D O I：

10.2307/2289716

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well-hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and related computational problems often encountered with maximum likelihood methods. Inferences pertaining to effects of predictor variables and inferences regarding predictions from logit models are also more satisfactory. The Bayesian strategy used in this project can be applied in other sparse-data settings where logistic regression is used because the approach can be implemented easily with any standard computer program for logit regression or log-linear analysis.

引用

页码：68 / 78

页数：11

共 64 条

[1] ALBERT A, 1984, BIOMETRIKA, V71, P1
[2] ANDERSON JA, 1980, HDB STATISTICS, V2
[3] [Anonymous], 1980, ANAL CROSS CLASSIFIE
[4] ANSCOMBE FJ, 1956, BIOMETRIKA, V43, P461, DOI 10.1093/biomet/43.3-4.461
[5] Bartlett M.S., 1935, J R STAT SOC, V2, P248, DOI [10.2307/2983639, DOI 10.2307/2983639]
[6] CALCULATION OF POLYCHOTOMOUS LOGISTIC-REGRESSION PARAMETERS USING INDIVIDUALIZED REGRESSIONS
BEGG, CB
GRAY, R
[J]. BIOMETRIKA, 1984, 71 (01) : 11 - 18
[7] Box G.E.P., 1992, BAYESIAN INFERENCE S
[8] THE EFFICIENCY OF MULTINOMIAL LOGISTIC-REGRESSION COMPARED WITH MULTIPLE GROUP DISCRIMINANT-ANALYSIS
BULL, SB
DONNER, A
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1987, 82 (400) : 1118 - 1122
[9] SOME COMMON PROBLEMS IN LOG-LINEAR ANALYSIS
CLOGG, CC
ELIASON, SR
[J]. SOCIOLOGICAL METHODS & RESEARCH, 1987, 16 (01) : 8 - 44
[10] CLOGG CC, 1986, COMPUTER SCI STATIST

← 1 2 3 4 5 6 7 →