Text Mining the Contributors to Rail Accidents

被引:102
作者
Brown, Donald E. [1 ]
机构
[1] Univ Virginia, Data Sci Inst, Charlottesville, VA 22904 USA
关键词
Rail safety; safety engineering; latent Dirichlet allocation; partial least squares; random forests; DRIVER; MODEL;
D O I
10.1109/TITS.2015.2472580
中图分类号
TU [建筑科学];
学科分类号
081407 [建筑环境与能源工程];
摘要
Rail accidents represent an important safety concern for the transportation industry in many countries. In the 11 years from 2001 to 2012, the U.S. had more than 40 000 rail accidents that cost more than $45 million. While most of the accidents during this period had very little cost, about 5200 had damages in excess of $141 500. To better understand the contributors to these extreme accidents, the Federal Railroad Administration has required the railroads involved in accidents to submit reports that contain both fixed field entries and narratives that describe the characteristics of the accident. While a number of studies have looked at the fixed fields, none have done an extensive analysis of the narratives. This paper describes the use of text mining with a combination of techniques to automatically discover accident characteristics that can inform a better understanding of the contributors to the accidents. The study evaluates the efficacy of text mining of accident narratives by assessing predictive performance for the costs of extreme accidents. The results show that predictive accuracy for accident costs significantly improves through the use of features found by text mining and predictive accuracy further improves through the use of modern ensemble methods. Importantly, this study also shows through case examples how the findings from text mining of the narratives can improve understanding of the contributors to rail accidents in ways not possible through only fixed field analysis of the accident reports.
引用
收藏
页码:346 / 355
页数:10
相关论文
共 30 条
[1]
Akin D, 2010, SCI RES ESSAYS, V5, P2837
[2]
[Anonymous], 2011, RAILR SAF STAT 2009
[3]
Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[4]
Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[5]
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[6]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]
Detecting Concealment of Intent in Transportation Screening: A Proof of Concept [J].
Burgoon, Judee K. ;
Twitchell, Douglas P. ;
Jensen, Matthew L. ;
Meservy, Thomas O. ;
Adkins, Mark ;
Kruse, John ;
Deokar, Amit V. ;
Tsechpenakis, Gabriel ;
Lu, Shan ;
Metaxas, Dimitris N. ;
Nunamaker, Jay F., Jr. ;
Younger, Robert E. .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2009, 10 (01) :103-112
[8]
Web-Based Traffic Sentiment Analysis: Methods and Applications [J].
Cao, Jianping ;
Zeng, Ke ;
Wang, Hui ;
Cheng, Jiajun ;
Qiao, Fengcai ;
Wen, Ding ;
Gao, Yanqing .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2014, 15 (02) :844-853
[9]
Decision support model for prioritizing railway level crossings for safety improvements: Application of the adaptive neuro-fuzzy system [J].
Cirovic, Goran ;
Pamucar, Dragan .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (06) :2208-2223
[10]
Real-Time Detection of Traffic From Twitter Stream Analysis [J].
D'Andrea, Eleonora ;
Ducange, Pietro ;
Lazzerini, Beatrice ;
Marcelloni, Francesco .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2015, 16 (04) :2269-2283