MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features

被引:365
作者
Jiang, Peng [1 ]
Wu, Haonan [1 ]
Wang, Wenkai [1 ]
Ma, Wei [1 ]
Sun, Xiao [1 ]
Lu, Zuhong [1 ]
机构
[1] SE Univ, Dept Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1093/nar/gkm368
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (pseudo pre-miRNAs), a hybrid feature which consists of local contiguous structure-sequence composition, minimum of free energy (MFE) of the secondary structure and P-value of randomization test is used. Besides, a novel machine-learning algorithm, random forest (RF), is introduced. The results suggest that our method predicts at 98.21% specificity and 95.09% sensitivity. When compared with the previous study, Triplet-SVM-classifier, our RF method was nearly 10% greater in total accuracy. Further analysis indicated that the improvement was due to both the combined features and the RF algorithm. The MiPred web server is available at http://www.bioinf.seu.edu.cn/miRNA/. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the RF classifier will predict whether it is a real pre-miRNA or a pseudo one.
引用
收藏
页码:W339 / W344
页数:6
相关论文
共 31 条
  • [1] MicroRNAs: Genomics, biogenesis, mechanism, and function (Reprinted from Cell, vol 116, pg 281-297, 2004)
    Bartel, David P.
    [J]. CELL, 2007, 131 (04) : 11 - 29
  • [2] Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes
    Baskerville, S
    Bartel, DP
    [J]. RNA, 2005, 11 (03) : 241 - 247
  • [3] Identification of hundreds of conserved and nonconserved human microRNAs
    Bentwich, I
    Avniel, A
    Karov, Y
    Aharonov, R
    Gilad, S
    Barad, O
    Barzilai, A
    Einat, P
    Einav, U
    Meiri, E
    Sharon, E
    Spector, Y
    Bentwich, Z
    [J]. NATURE GENETICS, 2005, 37 (07) : 766 - 770
  • [4] Detection of 91 potential in plant conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes
    Bonnet, E
    Wuyts, J
    Rouzé, P
    Van de Peer, Y
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (31) : 11511 - 11516
  • [5] Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences
    Bonnet, E
    Wuyts, J
    Rouzé, P
    Van de Peer, Y
    [J]. BIOINFORMATICS, 2004, 20 (17) : 2911 - 2917
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Dimitriadou E., 2006, e1071: Misc Functions of the Department of Statistics
  • [9] Managing the genome:: microRNAs in Drosophila
    Gesellchen, V
    Boutros, M
    [J]. DIFFERENTIATION, 2004, 72 (2-3) : 74 - 80
  • [10] The microRNA Registry
    Griffiths-Jones, S
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D109 - D111