Inference of Isoforms from Short Sequence Reads

被引:37
作者
Feng, Jianxing [1 ]
Li, Wei [2 ]
Jiang, Tao [2 ,3 ]
机构
[1] Tongji Univ, Sch Life Sci & Technol, Shanghai 200092, Peoples R China
[2] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[3] Tsinghua Univ, Beijing 100084, Peoples R China
关键词
alternative splicing; convex quadratic programming; deep sequencing; isoform inference; RNA-Seq; GENE-EXPRESSION; RNA-SEQ; TRANSCRIPTIONAL LANDSCAPE; EUKARYOTIC TRANSCRIPTOME; GENOME ANNOTATION; CAP ANALYSIS; IDENTIFICATION; GENERATION; DISCOVERY; ARRAYS;
D O I
10.1089/cmb.2010.0243
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e. g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS, and PAS information, especially for isoforms whose expression levels are significantly high. The software is publicly available for free at http://www.cs.ucr.edu/similar to jianxing/IsoInfer.html.
引用
收藏
页码:305 / 321
页数:17
相关论文
共 57 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   Variation in the Large-Scale Organization of Gene Expression Levels in the Hippocampus Relates to Stable Epigenetic Variability in Behavior [J].
Alter, Mark D. ;
Rubin, Daniel B. ;
Ramsey, Keri ;
Halpern, Rebecca ;
Stephan, Dietrich A. ;
Abbott, L. F. ;
Hen, Rene .
PLOS ONE, 2008, 3 (10)
[3]   Common intervals and sorting by reversals: a marriage of necessity [J].
Bergeron, A ;
Heber, S ;
Stoye, J .
BIOINFORMATICS, 2002, 18 :S54-S63
[4]   Global identification of human transcribed sequences with genome tiling arrays [J].
Bertone, P ;
Stolc, V ;
Royce, TE ;
Rozowsky, JS ;
Urban, AE ;
Zhu, XW ;
Rinn, JL ;
Tongprasit, W ;
Samanta, M ;
Weissman, S ;
Gerstein, M ;
Snyder, M .
SCIENCE, 2004, 306 (5705) :2242-2246
[5]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[6]  
Bishop C.M., 2006, J ELECTRON IMAGING, V16, P049901, DOI DOI 10.1117/1.2819119
[7]   GENE DISCOVERY IN DBEST [J].
BOGUSKI, MS ;
TOLSTOSHEV, CM ;
BASSETT, DE .
SCIENCE, 1994, 265 (5181) :1993-1994
[8]   THE TURNING-POINT IN GENOME RESEARCH [J].
BOGUSKI, MS .
TRENDS IN BIOCHEMICAL SCIENCES, 1995, 20 (08) :295-296
[9]   Detecting Alternative Gene Structures from Spliced ESTs: A Computational Approach [J].
Bonizzoni, Paola ;
Mauri, Giancarlo ;
Pesole, Graziano ;
Picardi, Ernesto ;
Pirola, Yuri ;
Rizzi, Raffaella .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (01) :43-66
[10]   ALTERNATIVE SPLICING - A UBIQUITOUS MECHANISM FOR THE GENERATION OF MULTIPLE PROTEIN ISOFORMS FROM SINGLE GENES [J].
BREITBART, RE ;
ANDREADIS, A ;
NADALGINARD, B .
ANNUAL REVIEW OF BIOCHEMISTRY, 1987, 56 :467-495