RNA-Seq gene expression estimation with read mapping uncertainty

被引:809
作者
Li, Bo [1 ]
Ruotti, Victor [2 ]
Stewart, Ron M. [2 ]
Thomson, James A. [2 ]
Dewey, Colin N. [1 ,3 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
[2] Morgridge Inst Res, Madison, WI 53707 USA
[3] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
关键词
STRATEGY; ARRAYS; GENOME;
D O I
10.1093/bioinformatics/btp692
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20-25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.
引用
收藏
页码:493 / 500
页数:8
相关论文
共 17 条
  • [1] Statistical modeling of sequencing errors in SAGE libraries
    Beissbarth, Tim
    Hyde, Lavinia
    Smyth, Gordon K.
    Job, Chris
    Boon, Wee-Ming
    Tan, Seong-Seng
    Scott, Hamish S.
    Speed, Terence P.
    [J]. BIOINFORMATICS, 2004, 20 : 31 - 39
  • [2] Stem cell transcriptome profiling via massive-scale mRNA sequencing
    Cloonan, Nicole
    Forrest, Alistair R. R.
    Kolle, Gabriel
    Gardiner, Brooke B. A.
    Faulkner, Geoffrey J.
    Brown, Mellissa K.
    Taylor, Darrin F.
    Steptoe, Anita L.
    Wani, Shivangi
    Bethel, Graeme
    Robertson, Alan J.
    Perkins, Andrew C.
    Bruce, Stephen J.
    Lee, Clarence C.
    Ranade, Swati S.
    Peckham, Heather E.
    Manning, Jonathan M.
    McKernan, Kevin J.
    Grimmond, Sean M.
    [J]. NATURE METHODS, 2008, 5 (07) : 613 - 619
  • [3] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [4] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [5] A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE
    Faulkner, Geoffrey J.
    Forrest, Alistair R. R.
    Chalk, Alistair M.
    Schroder, Kate
    Hayashizaki, Yoshihide
    Carninci, Piero
    Hume, David A.
    Grimmond, Sean M.
    [J]. GENOMICS, 2008, 91 (03) : 281 - 288
  • [6] The UCSC Known Genes
    Hsu, F
    Kent, WJ
    Clawson, H
    Kuhn, RM
    Diekhans, M
    Haussler, D
    [J]. BIOINFORMATICS, 2006, 22 (09) : 1036 - 1046
  • [7] Statistical inferences for isoform expression in RNA-Seq
    Jiang, Hui
    Wong, Wing Hung
    [J]. BIOINFORMATICS, 2009, 25 (08) : 1026 - 1032
  • [8] Cross-hybridization modeling on Affymetrix exon arrays
    Kapur, Karen
    Jiang, Hui
    Xing, Yi
    Wong, Wing Hung
    [J]. BIOINFORMATICS, 2008, 24 (24) : 2887 - 2893
  • [9] Lacroix V, 2008, LECT N BIOINFORMAT, V5251, P50, DOI 10.1007/978-3-540-87361-7_5
  • [10] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    Langmead, Ben
    Trapnell, Cole
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (03):