Repetitive DNA and next-generation sequencing: computational challenges and solutions

被引:1114
作者
Treangen, Todd J. [1 ]
Salzberg, Steven L. [1 ,2 ]
机构
[1] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
关键词
RNA-SEQ DATA; COPY NUMBER; COMBINATORIAL ALGORITHMS; SEGMENTAL DUPLICATION; STRUCTURAL VARIATION; SPLICE JUNCTIONS; SNP DISCOVERY; GENOME; ALIGNMENT; REVEALS;
D O I
10.1038/nrg3117
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
引用
收藏
页码:36 / 46
页数:11
相关论文
共 77 条
[1]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping [J].
Alkan, Can ;
Coe, Bradley P. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2011, 12 (05) :363-375
[2]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[3]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[4]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[5]   Detection of splice junctions from paired-end RNA-seq data by SpliceMap [J].
Au, Kin Fai ;
Jiang, Hui ;
Lin, Lan ;
Xing, Yi ;
Wong, Wing Hung .
NUCLEIC ACIDS RESEARCH, 2010, 38 (14) :4570-4578
[6]   Alu repeats and human genomic diversity [J].
Batzer, MA ;
Deininger, PL .
NATURE REVIEWS GENETICS, 2002, 3 (05) :370-379
[7]   Transposable element insertions have strongly affected human evolution [J].
Britten, Roy J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (46) :19945-19948
[8]   Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver [J].
Brunner, Alayne L. ;
Johnson, David S. ;
Kim, Si Wan ;
Valouev, Anton ;
Reddy, Timothy E. ;
Neff, Norma F. ;
Anton, Elizabeth ;
Medina, Catherine ;
Nguyen, Loan ;
Chiao, Eric ;
Oyolu, Chuba B. ;
Schroth, Gary P. ;
Absher, Devin M. ;
Baker, Julie C. ;
Myers, Richard M. .
GENOME RESEARCH, 2009, 19 (06) :1044-1056
[9]   The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes [J].
Cho, Nam-Hyuk ;
Kim, Hang-Rae ;
Lee, Jung-Hee ;
Kim, Se-Yoon ;
Kim, Jaejong ;
Cha, Sunho ;
Kim, Sang-Yoon ;
Darby, Alistair C. ;
Fuxelius, Hans-Henrik ;
Yin, Jun ;
Kim, Ju Han ;
Kim, Jihun ;
Lee, Sang Joo ;
Koh, Young-Sang ;
Jang, Won-Jong ;
Park, Kyung-Hee ;
Andersson, Siv G. E. ;
Choi, Myung-Sik ;
Kim, Ik-Sang .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (19) :7981-7986
[10]   Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data [J].
Chung, Dongjun ;
Kuan, Pei Fen ;
Li, Bo ;
Sanalkumar, Rajendran ;
Liang, Kun ;
Bresnick, Emery H. ;
Dewey, Colin ;
Keles, Suenduez .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (07)