Modeling non-uniformity in short-read rates in RNA-Seq data

被引:134
作者
Li, Jun [1 ]
Jiang, Hui [1 ,2 ]
Wong, Wing Hung [1 ,3 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Genome Technol Ctr, Palo Alto, CA 94304 USA
[3] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
来源
GENOME BIOLOGY | 2010年 / 11卷 / 05期
基金
美国国家科学基金会;
关键词
GENE-EXPRESSION; TRANSCRIPTOME; MICROARRAY; GENOME; ARRAYS; CHIP;
D O I
10.1186/gb-2010-11-5-r50
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
After mapping, RNA-Seq data can be summarized by a sequence of read counts commonly modeled as Poisson variables with constant rates along each transcript, which actually fit data poorly. We suggest using variable rates for different positions, and propose two models to predict these rates based on local sequences. These models explain more than 50% of the variations and can lead to improved estimates of gene and isoform expressions for both Illumina and Applied Biosystems data.
引用
收藏
页数:11
相关论文
共 37 条
[1]  
[Anonymous], 2018, Generalized linear models and extensions
[2]   Stem cell transcriptome profiling via massive-scale mRNA sequencing [J].
Cloonan, Nicole ;
Forrest, Alistair R. R. ;
Kolle, Gabriel ;
Gardiner, Brooke B. A. ;
Faulkner, Geoffrey J. ;
Brown, Mellissa K. ;
Taylor, Darrin F. ;
Steptoe, Anita L. ;
Wani, Shivangi ;
Bethel, Graeme ;
Robertson, Alan J. ;
Perkins, Andrew C. ;
Bruce, Stephen J. ;
Lee, Clarence C. ;
Ranade, Swati S. ;
Peckham, Heather E. ;
Manning, Jonathan M. ;
McKernan, Kevin J. ;
Grimmond, Sean M. .
NATURE METHODS, 2008, 5 (07) :613-619
[3]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[4]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[5]   Stochastic gradient boosting [J].
Friedman, JH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :367-378
[6]   Biases in Illumina transcriptome sequencing caused by random hexamer priming [J].
Hansen, Kasper D. ;
Brenner, Steven E. ;
Dudoit, Sandrine .
NUCLEIC ACIDS RESEARCH, 2010, 38 (12) :e131
[7]   The new paradigm of flow cell sequencing [J].
Holt, Robert A. ;
Jones, Steven J. M. .
GENOME RESEARCH, 2008, 18 (06) :839-846
[8]   Summaries of affymetrix GeneChip probe level data [J].
Irizarry, RA ;
Bolstad, BM ;
Collin, F ;
Cope, LM ;
Hobbs, B ;
Speed, TP .
NUCLEIC ACIDS RESEARCH, 2003, 31 (04) :e15
[9]   An integrated software system for analyzing ChIP-chip and ChIP-seq data [J].
Ji, Hongkai ;
Jiang, Hui ;
Ma, Wenxiu ;
Johnson, David S. ;
Myers, Richard M. ;
Wong, Wing H. .
NATURE BIOTECHNOLOGY, 2008, 26 (11) :1293-1300
[10]   SeqMap: mapping massive amount of oligonucleotides to the genome [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2008, 24 (20) :2395-2396