Split Samples and Design Sensitivity in Observational Studies

被引:46
作者
Heller, Ruth [1 ]
Rosenbaum, Paul R. [2 ]
Small, Dylan S. [2 ]
机构
[1] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel
[2] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Coherence; Multiple comparisons; Permutation test; Sensitivity analysis; INSTRUMENTAL VARIABLES; PROGRAM-EVALUATION; BIAS; CONFIDENCE;
D O I
10.1198/jasa.2009.tm08338
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An observational or nonrandomized study of treatment effects may be biased by failure to control for some relevant covariate that was not measured. The design of an observational study is known to strongly affect its sensitivity to biases from covariates that were not observed. For instance. the choice of an outcome to study, or the decision to combine several outcomes in a test for coherence, can materially affect the sensitivity to unobserved biases. Decisions, that shape the design are, therefore, critically important, but they are also difficult decisions to make in the absence of data. We consider the possibility of randomly splitting the data from an observational study into a smaller planning sample and a larger analysis sample, where the planning sample is used to guide decisions about design. After reviewing the concept of design sensitivity. we evaluate sample splitting in theory, by numerical computation, and by simulation, comparing it to several methods that use all of the data. Sample splitting is remarkably effective, much more so in observational studies than in randomized experiments: splitting 1,000 matched pairs into 100 planning pairs and 900 analysis pairs often materially improves the design sensitivity. An example from genetic toxicology is used to illustrate the method.
引用
收藏
页码:1090 / 1101
页数:12
相关论文
共 32 条
[1]   Bounding a matching estimator: the case of a Norwegian training program [J].
Aakvik, A .
OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 2001, 63 (01) :115-143
[2]   A SINGLE-SAMPLE MULTIPLE DECISION PROCEDURE FOR RANKING MEANS OF NORMAL POPULATIONS WITH KNOWN VARIANCES [J].
BECHHOFER, RE .
ANNALS OF MATHEMATICAL STATISTICS, 1954, 25 (01) :16-39
[3]  
Breslow NE., 1980, The analysis of case-control studies
[4]  
Campbell DT., 1988, METHODOLOGY EPISTEMO
[5]   Local sensitivity approximations for selectivity bias [J].
Copas, J ;
Eguchi, S .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 :871-895
[6]  
CORNFIELD J, 1959, JNCI-J NATL CANCER I, V22, P173
[7]   NOTE ON DATA-SPLITTING FOR EVALUATION OF SIGNIFICANCE LEVELS [J].
COX, DR .
BIOMETRIKA, 1975, 62 (02) :441-444
[8]   Assessing bias in the estimation of causal effects: Rosenbaum bounds on matching estimators and instrumental variables estimation with imperfect instruments [J].
DiPrete, TA ;
Gangl, M .
SOCIOLOGICAL METHODOLOGY, 2004, VOL 34, 2004, 34 :271-310
[9]  
Gastwirth J.L., 1992, JURIM J, V33, P19
[10]   ENVIRONMENT AND DISEASE - ASSOCIATION OR CAUSATION [J].
HILL, AB .
PROCEEDINGS OF THE ROYAL SOCIETY OF MEDICINE-LONDON, 1965, 58 (05) :295-+