Selecting Attributes for Sentiment Classification Using Feature Relation Networks

被引:113
作者
Abbasi, Ahmed [1 ]
France, Stephen [1 ]
Zhang, Zhu
Chen, Hsinchun [2 ]
机构
[1] Univ Wisconsin, Sheldon B Lubar Sch Business, Milwaukee, WI 53201 USA
[2] Univ Arizona, Eller Coll Management, Dept Management Informat Syst, Artificial Intelligence Lab, Tucson, AZ 85721 USA
关键词
Natural language processing; machine learning; text mining; subspace selection; affective computing; TEXT ANALYSIS; ALGORITHMS;
D O I
10.1109/TKDE.2010.110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major concern when incorporating large sets of diverse n-gram features for sentiment classification is the presence of noisy, irrelevant, and redundant attributes. These concerns can often make it difficult to harness the augmented discriminatory potential of extended feature sets. We propose a rule-based multivariate text feature selection method called Feature Relation Network (FRN) that considers semantic information and also leverages the syntactic relationships between n-gram features. FRN is intended to efficiently enable the inclusion of extended sets of heterogeneous n-gram features for enhanced sentiment classification. Experiments were conducted on three online review testbeds in comparison with methods used in prior sentiment classification research. FRN outperformed the comparison univariate, multivariate, and hybrid feature selection methods; it was able to select attributes resulting in significantly better classification accuracy irrespective of the feature subset sizes. Furthermore, by incorporating syntactic information about n-gram relations, FRN is able to select features in a more computationally efficient manner than many multivariate and hybrid techniques.
引用
收藏
页码:447 / 462
页数:16
相关论文
共 48 条
  • [1] Affect analysis of web forums and blogs using correlation ensembles
    Abbasi, Ahmed
    Chen, Hsinchun
    Thoms, Sven
    Fu, Tianjun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) : 1168 - 1180
  • [2] Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace
    Abbasi, Ahmed
    Chen, Hsinchun
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (02)
  • [3] Abbasi A, 2008, MIS QUART, V32, P811
  • [4] [Anonymous], 2006, P COLING ACL MAIN C
  • [5] [Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI [DOI 10.1145/775047.775098, 10.1145/775047.775098]
  • [6] [Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
  • [7] Stylistic text classification using functional lexical features
    Argamon, Shlomo
    Whitelaw, Casey
    Chase, Paul
    Hota, Sobhan Raj
    Garg, Navendu
    Levitan, Shlomo
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (06): : 802 - 822
  • [8] Development of hybrid genetic algorithms for product line designs
    Balakrishnan, PV
    Gupta, R
    Jacob, VS
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (01): : 468 - 483
  • [9] BIAN W, 2008, P 19 INT C PATT REC
  • [10] Burgun A., 2001, P N AM ASS COMPUTATI, P77