Who Wrote Ronald Reagan's Radio Addresses?

被引:27
作者
Airoldi, Edoardo M. [1 ]
Anderson, Annelise G. [2 ]
Fienberg, Stephen E. [1 ,3 ]
Skinner, Kiron K. [4 ,5 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Stanford Univ, Hoover Inst, Stanford, CA 94305 USA
[3] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Dept Hist, Pittsburgh, PA 15213 USA
[5] Carnegie Mellon Univ, Dept Social & Decis Sci, Pittsburgh, PA 15213 USA
来源
BAYESIAN ANALYSIS | 2006年 / 1卷 / 02期
关键词
Ronald Reagan; Radio Addresses; Authorship; Stylometry; Data Mining; Classification; Function Words; Semantic Analysis; Naive Bayes; Full Bayes; Poisson; Negative-Binomial; Modal Approximation; Mean Approximation;
D O I
10.1214/06-BA110
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In his campaign for the U. S. presidency from 1975 to 1979, Ronald Reagan delivered over 1000 radio broadcasts. For over 600 of these we have direct evidence of Reagan's authorship. The aim of this study was to determine the authorship of 312 of the broadcasts for which no direct evidence is available. We addressed the prediction problem for speeches delivered in different epochs and we explored a wide range of off-the-shelf classification methods and fully Bayesian generative models. Eventually we produced separate sets of predictions using the most accurate classifiers, based on non-contextual words as well as on semantic features, for the 312 speeches of uncertain authorship. All the predictions agree on 135 of the "unknown" speeches, whereas the fully Bayesian models agree on an additional 154 of them. The magnitude of the posterior odds of authorship led us to conclude that Ronald Reagan drafted 167 speeches and was aided in the preparation of the remaining 145. Our inferences were not sensitive to "reasonable" variations in the sets of constants underlying the prior distributions, and the cross-validated accuracy of our best fully Bayesian model was above 90 percent in all cases. The agreement of multiple methods for predicting the authorship for the "unknown" speeches reinforced our confidence in the accuracy of our classifications.
引用
收藏
页码:289 / 319
页数:31
相关论文
共 38 条
[1]  
Airoldi E. M., 2004, P CLASS SOC N AM INT
[2]  
AIROLDI EM, 2003, CMUSTAT03789
[3]  
AIROLDI EM, 2006, LECT NOTES IN PRESS
[4]  
[Anonymous], REAGANS PATH VICTORY
[5]  
[Anonymous], 2004, ALL STAT
[6]  
[Anonymous], P 24 ANN INT ACM SIG, DOI DOI 10.1145/383952.384019
[7]  
[Anonymous], 1954, INTRO MATH STAT
[8]  
Beeferman D, 1997, 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P373
[9]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[10]  
Bishop Y.M., 2007, DISCRETE MULTIVARIAT