Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier

被引:139
作者
Yousef, Malik
Nebozhyn, Michael
Shatkay, Hagit
Kanterakis, Stathis
Showe, Louise C.
Showe, Michael K. [1 ]
机构
[1] Wistar Inst Anat & Biol, Philadelphia, PA 19104 USA
[2] Queens Univ, Sch Comp, Kingston, ON, Canada
关键词
D O I
10.1093/bioinformatics/btl094
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naive Bayes classifier. It automatically generates a model from the training data, which consists of sequence and structure information of known miRNAs from a variety of species. Results: Our study shows that the application of machine learning techniques, along with the integration of data from multiple species is a useful and general approach for miRNA gene prediction. Based on our experiments, we believe that this new technique is applicable to an extensive range of eukaryotes' genomes. Specific structure and sequence features are first used to identify miRNAs followed by a comparative analysis to decrease the number of false positives (FPs). The resulting algorithm exhibits higher specificity and similar sensitivity compared to currently used algorithms that rely on conserved genomic regions to decrease the rate of FPs.
引用
收藏
页码:1325 / 1334
页数:10
相关论文
共 26 条
  • [1] A uniform system for microRNA annotation
    Ambros, V
    Bartel, B
    Bartel, DP
    Burge, CB
    Carrington, JC
    Chen, XM
    Dreyfuss, G
    Eddy, SR
    Griffiths-Jones, S
    Marshall, M
    Matzke, M
    Ruvkun, G
    Tuschl, T
    [J]. RNA, 2003, 9 (03) : 277 - 279
  • [2] [Anonymous], 1996, BOW TOOLKIT STAT LAN
  • [3] MicroRNAs: Genomics, biogenesis, mechanism, and function (Reprinted from Cell, vol 116, pg 281-297, 2004)
    Bartel, David P.
    [J]. CELL, 2007, 131 (04) : 11 - 29
  • [4] Computational and experimental identification of C-elegans microRNAs
    Grad, Y
    Aach, J
    Hayes, GD
    Reinhart, BJ
    Church, GM
    Ruvkun, G
    Kim, J
    [J]. MOLECULAR CELL, 2003, 11 (05) : 1253 - 1263
  • [5] Griffen TD, 2004, J INDO-EUR STUD, V32, P11
  • [6] miRBase: microRNA sequences, targets and gene nomenclature
    Griffiths-Jones, Sam
    Grocock, Russell J.
    van Dongen, Stijn
    Bateman, Alex
    Enright, Anton J.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D140 - D144
  • [7] Japkowicz N., 2002, Intelligent Data Analysis, V6, P429
  • [8] The human genome browser at UCSC
    Kent, WJ
    Sugnet, CW
    Furey, TS
    Roskin, KM
    Pringle, TH
    Zahler, AM
    Haussler, D
    [J]. GENOME RESEARCH, 2002, 12 (06) : 996 - 1006
  • [9] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
  • [10] Identification of novel genes coding for small expressed RNAs
    Lagos-Quintana, M
    Rauhut, R
    Lendeckel, W
    Tuschl, T
    [J]. SCIENCE, 2001, 294 (5543) : 853 - 858