MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features

被引:365
作者
Jiang, Peng [1 ]
Wu, Haonan [1 ]
Wang, Wenkai [1 ]
Ma, Wei [1 ]
Sun, Xiao [1 ]
Lu, Zuhong [1 ]
机构
[1] SE Univ, Dept Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1093/nar/gkm368
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (pseudo pre-miRNAs), a hybrid feature which consists of local contiguous structure-sequence composition, minimum of free energy (MFE) of the secondary structure and P-value of randomization test is used. Besides, a novel machine-learning algorithm, random forest (RF), is introduced. The results suggest that our method predicts at 98.21% specificity and 95.09% sensitivity. When compared with the previous study, Triplet-SVM-classifier, our RF method was nearly 10% greater in total accuracy. Further analysis indicated that the improvement was due to both the combined features and the RF algorithm. The MiPred web server is available at http://www.bioinf.seu.edu.cn/miRNA/. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the RF classifier will predict whether it is a real pre-miRNA or a pseudo one.
引用
收藏
页码:W339 / W344
页数:6
相关论文
共 31 条
  • [31] Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier
    Yousef, Malik
    Nebozhyn, Michael
    Shatkay, Hagit
    Kanterakis, Stathis
    Showe, Louise C.
    Showe, Michael K.
    [J]. BIOINFORMATICS, 2006, 22 (11) : 1325 - 1334