Fast and SNP-tolerant detection of complex variants and splicing in short reads

被引:1465
作者
Wu, Thomas D. [1 ]
Nacu, Serban [1 ]
机构
[1] Genentech Inc, Dept Bioinformat, San Francisco, CA 94080 USA
关键词
ALIGNMENT PROGRAM; GENOME; METHYLATION; RNA; POLYMORPHISMS; ULTRAFAST; SEQUENCES; GENES; TOOL;
D O I
10.1093/bioinformatics/btq057
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single-and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state. Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of >= 70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1-9 nt and deletions of 1-30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7-8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads. Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http://share.gene.com/gmap.
引用
收藏
页码:873 / 881
页数:9
相关论文
共 33 条
[1]
Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes [J].
Bhangale, TR ;
Rieder, MJ ;
Livingston, RJ ;
Nickerson, DA .
HUMAN MOLECULAR GENETICS, 2005, 14 (01) :59-69
[2]
Burrows M, 1994, BLOCK SORTING LOSSLE
[3]
Evaluation of DNA microarray results with quantitative gene expression platforms [J].
Canales, Roger D. ;
Luo, Yuling ;
Willey, James C. ;
Austermiller, Bradley ;
Barbacioru, Catalin C. ;
Boysen, Cecilie ;
Hunkapiller, Kathryn ;
Jensen, Roderick V. ;
Knight, Charles R. ;
Lee, Kathleen Y. ;
Ma, Yunqing ;
Maqsodi, Botoul ;
Papallo, Adam ;
Peters, Elizabeth Herness ;
Poulter, Karen ;
Ruppel, Patricia L. ;
Samaha, Raymond R. ;
Shi, Leming ;
Yang, Wen ;
Zhang, Lu ;
Goodsaid, Federico M. .
NATURE BIOTECHNOLOGY, 2006, 24 (09) :1115-1122
[4]
Locus-specific control of asymmetric and CpNpG methylation by the DRM and CMT3 methyltransferase genes [J].
Cao, XF ;
Jacobsen, SE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 :16491-16498
[5]
Optimal spliced alignments of short sequence reads [J].
De Bona, Fabio ;
Ossowski, Stephan ;
Schneeberger, Korbinian ;
Raetsch, Gunnar .
BIOINFORMATICS, 2008, 24 (16) :I174-I180
[6]
Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming [J].
Deng, Jie ;
Shoemaker, Robert ;
Xie, Bin ;
Gore, Athurva ;
LeProust, Emily M. ;
Antosiewicz-Bourget, Jessica ;
Egli, Dieter ;
Maherali, Nimet ;
Park, In-Hyun ;
Yu, Junying ;
Daley, George Q. ;
Eggan, Kevin ;
Hochedlinger, Konrad ;
Thomson, James ;
Wang, Wei ;
Gao, Yuan ;
Zhang, Kun .
NATURE BIOTECHNOLOGY, 2009, 27 (04) :353-360
[7]
A conserved non-homeodomain Hoxa9 isoform interacting with CBP is co-expressed with the 'typical' Hoxa9 protein during embryogenesis [J].
Dintilhac, A ;
Bihan, R ;
Guerrier, D ;
Deschamps, S ;
Pellerin, I .
GENE EXPRESSION PATTERNS, 2004, 4 (02) :215-222
[8]
Eukaryotic cytosine methyltransferases [J].
Goll, MG ;
Bestor, TH .
ANNUAL REVIEW OF BIOCHEMISTRY, 2005, 74 :481-514
[9]
A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome [J].
Hampton, Oliver A. ;
Den Hollander, Petra ;
Miller, Christopher A. ;
Delgado, David A. ;
Li, Jian ;
Coarfa, Cristian ;
Harris, Ronald A. ;
Richards, Stephen ;
Scherer, Steven E. ;
Muzny, Donna M. ;
Gibbs, Richard A. ;
Lee, Adrian V. ;
Milosavljevic, Aleksandar .
GENOME RESEARCH, 2009, 19 (02) :167-177
[10]
HWANG FK, 1980, SICOMP, V1, P31