Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

被引:1694
作者
Grimmer, Justin [1 ]
Stewart, Brandon M. [2 ,3 ]
机构
[1] Stanford Univ, Dept Polit Sci, Stanford, CA 94305 USA
[2] Harvard Univ, Dept Govt, Cambridge, MA 02138 USA
[3] Harvard Univ, Inst Quantitat Social Sci, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
POLICY POSITIONS; CLASSIFICATION; INFERENCE; WORDS; PREFERENCES; MODEL;
D O I
10.1093/pan/mps028
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods-they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.
引用
收藏
页码:267 / 297
页数:31
相关论文
共 85 条
[1]  
[Anonymous], 2006, Pattern recognition and machine learning
[2]  
Ansolabehere S., 1995, Going negative: How political advertising shrinks and polarizes the electorate
[3]   DERIVATION OF THEORY BY MEANS OF FACTOR ANALYSIS OR SWIFT,T AND HIS ELECTRIC FACTOR ANALYSIS MACHINE [J].
ARMSTRONG, JS .
AMERICAN STATISTICIAN, 1967, 21 (05) :17-21
[4]   Delivering the goods: Legislative particularism in different electoral and institutional settings [J].
Ashworth, S ;
de Mesquita, EB .
JOURNAL OF POLITICS, 2006, 68 (01) :168-179
[5]  
Beauchamp Nicholas., 2011, Using Text to Scale Legislatures with Uninformative Voting
[6]  
Bembom O, 2007, STAT APPL GENET MOL, V6
[7]   Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions [J].
Benoit, Kenneth ;
Laver, Michael ;
Mikhaylov, Slava .
AMERICAN JOURNAL OF POLITICAL SCIENCE, 2009, 53 (02) :495-513
[8]   Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk [J].
Berinsky, Adam J. ;
Huber, Gregory A. ;
Lenz, Gabriel S. .
POLITICAL ANALYSIS, 2012, 20 (03) :351-368
[9]  
Bishop CM., 1995, NEURAL NETWORKS PATT
[10]   Variational Inference for Dirichlet Process Mixtures [J].
Blei, David M. ;
Jordan, Michael I. .
BAYESIAN ANALYSIS, 2006, 1 (01) :121-143