Accurate identification of alternatively spliced exons using support vector machine

被引:84
作者
Dror, G [1 ]
Sorek, R
Shamir, R
机构
[1] Acad Coll Tel Aviv Yaffo, IL-4044 Tel Aviv, Israel
[2] Tel Aviv Univ, Sackler Fac Med, Dept Human Genet, IL-69978 Tel Aviv, Israel
[3] Compugen, IL-69512 Tel Aviv, Israel
[4] Tel Aviv Univ, Sch Comp Sci, IL-69073 Tel Aviv, Israel
关键词
D O I
10.1093/bioinformatics/bti132
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Alternative splicing is a major component of the regulatory action on mammalian transcriptomes. It is estimated that over half of all human genes have more than one splice variant. Previous studies have shown that alternatively spliced exons possess several features that distinguish them from constitutively spliced ones. Recently, we have demonstrated that such features can be used to distinguish alternative from constitutive exons. In the current study, we used advanced machine learning methods to generate robust classifier of alternative exons. Results: We extracted several hundred local sequence features of constitutive as well as alternative exons. Using feature selection methods we find seven attributes that are dominant for the task of classification. Several less informative features help to slightly increase the performance of the classifier. The classifier achieves a true positive rate of 50% for a false positive rate of 0.5%. This result enables one to reliably identify alternatively spliced exons in exon databases that are believed to be dominated by constitutive exons.
引用
收藏
页码:897 / 901
页数:5
相关论文
共 36 条
  • [1] AGARWAL S, 2004, UIUCDCSR20042433 DEP
  • [2] EST comparison indicates 38% of human mRNAs contain possible alternative splice forms
    Brett, D
    Hanke, J
    Lehmann, G
    Haase, S
    Delbrück, S
    Krueger, S
    Reich, J
    Bork, P
    [J]. FEBS LETTERS, 2000, 474 (01) : 83 - 86
  • [3] Brownell WE, 1997, VOLTA REV, V99, P9
  • [4] Listening to silence and understanding nonsense: Exonic mutations that affect splicing
    Cartegni, L
    Chew, SL
    Krainer, AR
    [J]. NATURE REVIEWS GENETICS, 2002, 3 (04) : 285 - 298
  • [5] Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human
    Clark, F
    Thanaraj, TA
    [J]. HUMAN MOLECULAR GENETICS, 2002, 11 (04) : 451 - 464
  • [6] Evaluation of simple performance measures for tuning SVM hyperparameters
    Duan, K
    Keerthi, SS
    Poo, AN
    [J]. NEUROCOMPUTING, 2003, 51 : 41 - 59
  • [7] A computer program for aligning a cDNA sequence with a genomic DNA sequence
    Florea, L
    Hartzell, G
    Zhang, Z
    Rubin, GM
    Miller, W
    [J]. GENOME RESEARCH, 1998, 8 (09) : 967 - 974
  • [8] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [9] Alternative splicing: increasing diversity in the proteomic world
    Graveley, BR
    [J]. TRENDS IN GENETICS, 2001, 17 (02) : 100 - 107
  • [10] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422