Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

被引:354
作者
Degner, Jacob F. [1 ,2 ]
Marioni, John C. [1 ]
Pai, Athma A. [1 ]
Pickrell, Joseph K. [1 ]
Nkadori, Everlyne [1 ,3 ]
Gilad, Yoav [1 ]
Pritchard, Jonathan K. [1 ,3 ]
机构
[1] Univ Chicago, Dept Human Genet, Chicago, IL 60637 USA
[2] Univ Chicago, Comm Genet Genom & Syst Biol, Chicago, IL 60637 USA
[3] Univ Chicago, Howard Hughes Med Inst, Chicago, IL 60637 USA
关键词
GENE-EXPRESSION; HAPLOTYPE MAP; HUMAN GENOME;
D O I
10.1093/bioinformatics/btp579
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, similar to 5-10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data.
引用
收藏
页码:3207 / 3212
页数:6
相关论文
共 18 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]  
[Anonymous], 2009, BIOINFORMATICS
[3]   Global Survey of Genomic Imprinting by Transcriptome Sequencing [J].
Babak, Tomas ;
DeVeale, Brian ;
Armour, Christopher ;
Raymond, Christopher ;
Cleary, Michele A. ;
van der Kooy, Derek ;
Johnson, Jason M. ;
Lim, Lee P. .
CURRENT BIOLOGY, 2008, 18 (22) :1735-1741
[4]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[5]   Mechanisms of imprinting of the Prader-Willi/Angelman region [J].
Horsthemke, Bernhard ;
Wagstaff, Joseph .
AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2008, 146A (16) :2041-2052
[6]   DNA sequencing - A plan to capture human diversity in 1000 genomes [J].
Kaiser, Jocelyn .
SCIENCE, 2008, 319 (5862) :395-395
[7]   Allele-specific gene expression uncovered [J].
Knight, JC .
TRENDS IN GENETICS, 2004, 20 (03) :113-116
[8]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[9]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[10]   Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation [J].
Milani, Lili ;
Lundmark, Anders ;
Nordlund, Jessica ;
Kiialainen, Anna ;
Flaegstad, Trond ;
Jonmundsson, Gudmundur ;
Kanerva, Jukka ;
Schmiegelow, Kjeld ;
Gunderson, Kevin L. ;
Lonnerholm, Gudmar ;
Syvanen, Ann-Christine .
GENOME RESEARCH, 2009, 19 (01) :1-11