基于随机森林算法的两阶段变量选择研究

被引:16
作者
冯盼峰
温永仙
机构
[1] 福建农林大学计算机与信息学院
关键词
随机森林; 变量选择; 变量重要性; QTL定位;
D O I
暂无
中图分类号
TP301.6 [算法理论];
学科分类号
摘要
变量选择在高维数据处理中尤为重要,其中变量的重要性评级是关键问题.文章提出基于随机森林两阶段逐步变量选择算法.第一阶段提出变量重要性排序改进方法,目的进一步提高重要变量与噪声变量的区分度.第二阶段基于随机森林的逐步变量选择.通过模拟数据验证该方法的有效性和可行性.对水稻数据QTL定位进行实证研究,将基于两阶段随机森林逐步变量选择算法与SCAD、Elastic Net、传统QTL定位WinQTLcart2.5软件的运行结果比较,发现基于随机森林两阶段逐步变量选择算法能有效筛选变量.
引用
收藏
页码:119 / 130
页数:12
相关论文
共 6 条
  • [1] A new variable importance measure for random forests with missing data
    Hapfelmeier, Alexander
    Hothorn, Torsten
    Ulm, Kurt
    Strobl, Carolin
    [J]. STATISTICS AND COMPUTING, 2014, 24 (01) : 21 - 34
  • [2] Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures[J] . Kristin K. Nicodemus. Briefings in Bioinformatics . 2011 (4)
  • [3] Variable selection using random forests
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    [J]. PATTERN RECOGNITION LETTERS, 2010, 31 (14) : 2225 - 2236
  • [4] Prediction of Tumoricidal Activity and Accumulation of Photosensitizers in Photodynamic Therapy Using Multiple Linear Regression and Artificial Neural Networks[J] . Photochemistry and Photobiology . 2007 (5)
  • [5] Unbiased Recursive Partitioning: A Conditional Inference Framework[J] . Torsten Hothorn,Kurt Hornik,Achim Zeileis. Journal of Computational and Graphical Statistics . 2006 (3)
  • [6] Random Forests.[J] . Leo Breiman. Machine Learning . 2001 (1)