Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

被引:67
作者
Eksi, Ridvan [1 ]
Li, Hong-Dong [1 ]
Menon, Rajasree [1 ]
Wen, Yuchen [1 ]
Omenn, Gilbert S. [1 ,2 ]
Kretzler, Matthias [1 ,2 ]
Guan, Yuanfang [1 ,2 ,3 ]
机构
[1] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Internal Med, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
关键词
CANCER BIOMARKER CANDIDATES; FUNCTION PREDICTION; PROTEIN FUNCTION; GENE-EXPRESSION; MESSENGER-RNA; BREAST-CANCER; TRANSCRIPT; GENOMICS; RESIDUES; INFORMATICS;
D O I
10.1371/journal.pcbi.1003314
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires 'ground-truth' functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the 'responsible' isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the 'responsible' isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions.
引用
收藏
页数:16
相关论文
共 82 条
[1]  
Andrews S, 2002, AAAI 02 P
[2]  
[Anonymous], 1998, FRAMEWORK MULTIPLE I
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]  
Babenko B, Multiple Instance Learning: Algorithms and Applications
[5]   Deciphering the splicing code [J].
Barash, Yoseph ;
Calarco, John A. ;
Gao, Weijun ;
Pan, Qun ;
Wang, Xinchen ;
Shai, Ofer ;
Blencowe, Benjamin J. ;
Frey, Brendan J. .
NATURE, 2010, 465 (7294) :53-59
[6]   Global identification of human transcribed sequences with genome tiling arrays [J].
Bertone, P ;
Stolc, V ;
Royce, TE ;
Rozowsky, JS ;
Urban, AE ;
Zhu, XW ;
Rinn, JL ;
Tongprasit, W ;
Samanta, M ;
Weissman, S ;
Gerstein, M ;
Snyder, M .
SCIENCE, 2004, 306 (5705) :2242-2246
[7]   Mechanisms of alternative pre-messenger RNA splicing [J].
Black, DL .
ANNUAL REVIEW OF BIOCHEMISTRY, 2003, 72 :291-336
[8]   rQuant.web: a tool for RNA-Seq-based transcript quantitation [J].
Bohnert, Regina ;
Raetsch, Gunnar .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W348-W351
[9]  
Bunescu RC, 2007, P 24 INT C MACH LEAR, P105, DOI 10.1145/1273496.1273510
[10]   Functional genomics in Drosophila models of human disease [J].
Chen, Ko-Fan ;
Crowther, Damian C. .
BRIEFINGS IN FUNCTIONAL GENOMICS, 2012, 11 (05) :405-415