Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases

被引:11
作者
Li, Wentian [1 ]
Freudenberg, Jan [1 ]
机构
[1] North Shore LIJ Hlth Syst, Robert S Boas Ctr Genom & Human Genet, Feinstein Inst Med Res, Manhasset, NY 11030 USA
基金
美国国家卫生研究院;
关键词
SEGMENTAL DUPLICATIONS; HOMOLOGOUS RECOMBINATION; DNA; MAPPABILITY; COMPLEXITY; GENE; ORGANIZATION; ANNOTATION; DATABASE; ALU;
D O I
10.1016/j.compbiolchem.2014.08.015
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Repetitive and redundant regions of a genome are particularly problematic for mapping sequencing reads. In the present paper, we compile a list of the unmappable regions in the human genome based on the following definition: hypothetical reads with length 1 kb which cannot be uniquely mapped with zero-mismatch alignment for the described regions, considering both the forward and reverse strand. The respective collection of unmappable regions covers 0.77% of the sequence of human autosomes and 8.25% of the sex chromosomes in the reference genome GRCh37/hg19 (overall 1.23%). Not surprisingly, our unmappable regions overlap greatly with segmental duplication, transposable elements, and structural variants. About 99.8% of bases in our unmappable regions are part of either segmental duplication or transposable elements and 98.3% overlap structural variant annotations. Notably, some of these regions overlap units with important biological functions, including 4% of protein-coding genes. In contrast, these regions have zero intersection with the ultraconserved elements, very low overlap with microRNAs, tRNAs, pseudogenes, CpG islands, tandem repeats, microsatellites, sensitive non-coding regions, and the mapping blacklist regions from the ENCODE project. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:108 / 117
页数:10
相关论文
共 90 条
[1]   The distribution of L1 and Alu retroelements in relation to GC content on human sex chromosomes is consistent with the ectopic recombination model [J].
Abrusan, Gyorgy ;
Krambeck, Hans-Juergen .
JOURNAL OF MOLECULAR EVOLUTION, 2006, 63 (04) :484-492
[2]  
Ahmed S, 1999, BIOL CHEM, V380, P3
[3]  
Aldwairi T., 2013, P INT C BIOINF COMP, P674
[4]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[5]   Toward single molecule DNA sequencing:: Direct identification of ribonucleoside and deoxyribonucleoside 5′-monophosphates by using an engineered protein nanopore equipped with a molecular adapter [J].
Astier, Y ;
Braha, O ;
Bayley, H .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2006, 128 (05) :1705-1710
[6]  
Baayen R. H., 2001, WORD FREQUENCY DISTR, V18
[7]   Recent segmental duplications in the human genome [J].
Bailey, JA ;
Gu, ZP ;
Clark, RA ;
Reinert, K ;
Samonte, RV ;
Schwartz, S ;
Adams, MD ;
Myers, EW ;
Li, PW ;
Eichler, EE .
SCIENCE, 2002, 297 (5583) :1003-1007
[8]   Segmental duplications: Organization and impact within the current Human Genome Project assembly [J].
Bailey, JA ;
Yavor, AM ;
Massa, HF ;
Trask, BJ ;
Eichler, EE .
GENOME RESEARCH, 2001, 11 (06) :1005-1017
[9]   Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome [J].
Becher, Veronica ;
Deymonnaz, Alejandro ;
Heiber, Pablo .
BIOINFORMATICS, 2009, 25 (14) :1746-1753
[10]   Ultraconserved elements in the human genome [J].
Bejerano, G ;
Pheasant, M ;
Makunin, I ;
Stephen, S ;
Kent, WJ ;
Mattick, JS ;
Haussler, D .
SCIENCE, 2004, 304 (5675) :1321-1325