GMAP: a genomic mapping and alignment program for mRNA and EST sequences

被引:1624
作者
Wu, TD [1 ]
Watanabe, CK
机构
[1] Genentech Inc, Dept Bioinformat, San Francisco, CA 94080 USA
[2] Genentech Inc, Dept Corp Informat Technol, San Francisco, CA 94080 USA
关键词
D O I
10.1093/bioinformatics/bti310
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We introduce gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. Results: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, gmap identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, gmap provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, gmap performed comparably with GeneSeqer. In these experiments, gmap demonstrated a several-fold increase in speed over existing programs.
引用
收藏
页码:1859 / 1875
页数:17
相关论文
共 95 条
[81]  
, 10.1093/nar/gkg653]
[82]   LOCATING PROTEIN-CODING REGIONS IN HUMAN DNA-SEQUENCES BY A MULTIPLE SENSOR NEURAL NETWORK APPROACH [J].
UBERBACHER, EC ;
MURAL, RJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (24) :11261-11265
[83]   Optimal spliced alignment of homologous cDNA to a genomic DNA template [J].
Usuka, J ;
Zhu, W ;
Brendel, V .
BIOINFORMATICS, 2000, 16 (03) :203-211
[84]   Computational discovery of internal micro-exons [J].
Volfovsky, N ;
Haas, BJ ;
Salzberg, SL .
GENOME RESEARCH, 2003, 13 (06) :1216-1221
[85]   Spidey: A tool for mRNA-to-genomic alignments [J].
Wheelan, SJ ;
Church, DM ;
Ostell, JM .
GENOME RESEARCH, 2001, 11 (11) :1952-1957
[86]   Database resources of the National Center for Biotechnology [J].
Wheeler, DL ;
Church, DM ;
Federhen, S ;
Lash, AE ;
Madden, TL ;
Pontius, JU ;
Schuler, GD ;
Schriml, LM ;
Sequeira, E ;
Tatusova, TA ;
Wagner, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :28-33
[87]   SGP-1:: Prediction and validation of homologous genes based on sequence alignments [J].
Wiehe, T ;
Gebauer-Jung, S ;
Mitchell-Olds, T ;
Guigó, R .
GENOME RESEARCH, 2001, 11 (09) :1574-1583
[88]   A MINIMAL INTRON LENGTH BUT NO SPECIFIC INTERNAL SEQUENCE IS REQUIRED FOR SPLICING THE LARGE RABBIT BETA-GLOBIN INTRON [J].
WIERINGA, B ;
HOFER, E ;
WEISSMANN, C .
CELL, 1984, 37 (03) :915-925
[89]   Variation in alternative splicing across human tissues [J].
Yeo, G ;
Holste, D ;
Kreiman, G ;
Burge, CB .
GENOME BIOLOGY, 2004, 5 (10)
[90]   Minimal introns are not "junky" [J].
Yu, J ;
Yang, ZY ;
Kibukawa, M ;
Paddock, M ;
Passey, DA ;
Wong, GKS .
GENOME RESEARCH, 2002, 12 (08) :1185-1189