Umap and Bismap: quantifying genome and methylome mappability

被引:93
作者
Karimzadeh, Mehran [1 ,2 ,3 ]
Ernst, Carl [4 ]
Kundaje, Anshul [5 ,6 ]
Hoffman, Michael M. [1 ,2 ,3 ,7 ]
机构
[1] Princess Margaret Canc Ctr, Toronto, ON M5G 1L7, Canada
[2] Univ Toronto, Dept Med Biophys, Toronto, ON M5G 1L7, Canada
[3] Vector Inst, Toronto, ON M5G 1M1, Canada
[4] McGill Univ, Dept Human Genet, Montreal, PQ H3A 0C7, Canada
[5] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[6] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[7] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
READ ALIGNMENT; DNA; DISCOVERY; BROWSER; PROBES; BIAS; SITE;
D O I
10.1093/nar/gky677
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Short-read sequencing enables assessment of genetic and biochemical traits of individual genomic regions, such as the location of genetic variation, protein binding and chemical modifications. Every region in a genome assembly has a property called 'mappability', which measures the extent to which it can be uniquely mapped by sequence reads. In regions of lower mappability, estimates of genomic and epigenomic characteristics from sequencing assays are less reliable. These regions have increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Bisulfite sequencing approaches used to identify DNA methylation exacerbate these problems by introducing large numbers of reads that map to multiple regions. Both to correct assumptions of uniformity in downstream analysis and to identify regions where the analysis is less reliable, it is necessary to know the mappability of both ordinary and bisulfite-converted genomes. We introduce the Umap software for identifying uniquely mappable regions of any genome. Its Bismap extension identifies mappability of the bisulfite-converted genome. A Umap and Bismap track hub for human genome assemblies GRCh37/hg19 and GRCh38/hg38, and mouse assemblies GRCm37/mm9 and GRCm38/mm10 is available at https://bismap.hoffmanlab.org for use with genome browsers.
引用
收藏
页数:13
相关论文
共 31 条
[1]   High density DNA methylation array with single CpG site resolution [J].
Bibikova, Marina ;
Barnes, Bret ;
Tsan, Chan ;
Ho, Vincent ;
Klotzle, Brandy ;
Le, Jennie M. ;
Delano, David ;
Zhang, Lu ;
Schroth, Gary P. ;
Gunderson, Kevin L. ;
Fan, Jian-Bing ;
Shen, Richard .
GENOMICS, 2011, 98 (04) :288-295
[2]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[3]   Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray [J].
Chen, Yi-an ;
Lemire, Mathieu ;
Choufani, Sanaa ;
Butcher, Darci T. ;
Grafodatskaya, Daria ;
Zanke, Brent W. ;
Gallinger, Steven ;
Hudson, Thomas J. ;
Weksberg, Rosanna .
EPIGENETICS, 2013, 8 (02) :203-209
[4]   Cross-Reactive DNA Microarray Probes Lead to False Discovery of Autosomal Sex-Associated DNA Methylation [J].
Chen, Yi-an ;
Choufani, Sanaa ;
Grafodatskaya, Dania ;
Butcher, Darci T. ;
Ferreira, Jose C. ;
Weksberg, Rosanna .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 91 (04) :762-764
[5]   Systematic bias in high-throughput sequencing data and its correction by BEADS [J].
Cheung, Ming-Sin ;
Down, Thomas A. ;
Latorre, Isabel ;
Ahringer, Julie .
NUCLEIC ACIDS RESEARCH, 2011, 39 (15) :e103
[6]   Fast Computation and Applications of Genome Mappability [J].
Derrien, Thomas ;
Estelle, Jordi ;
Marco Sola, Santiago ;
Knowles, David G. ;
Raineri, Emanuele ;
Guigo, Roderic ;
Ribeca, Paolo .
PLOS ONE, 2012, 7 (01)
[7]   An Epigenetic Memory of Pregnancy in the Mouse Mammary Gland [J].
dos Santos, Camila O. ;
Dolzhenko, Egor ;
Hodges, Emily ;
Smith, Andrew D. ;
Hannon, Gregory J. .
CELL REPORTS, 2015, 11 (07) :1102-1109
[8]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[9]  
Hansen KD, 2012, GENOME BIOL, V13, DOI [10.1186/gb-2012-13-10-R83, 10.1186/gb-2012-13-10-r83]
[10]   The UCSC Table Browser data retrieval tool [J].
Karolchik, D ;
Hinrichs, AS ;
Furey, TS ;
Roskin, KM ;
Sugnet, CW ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D493-D496