Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs

被引:847
作者
Chevreux, B [1 ]
Pfisterer, T
Drescher, B
Driesel, AJ
Müller, WEG
Wetter, T
Suhai, S
机构
[1] German Canc Res Ctr, Dept Mol Biophys, D-69120 Heidelberg, Germany
[2] Heidelberg Univ, Inst Med Biometry & Informat, D-69120 Heidelberg, Germany
[3] Johannes Gutenberg Univ Mainz, Inst Physiol Chem, Angew Mol Biol Abt, D-55099 Mainz, Germany
[4] VitiGen AG, D-76833 Siebeldingen, Germany
[5] RZPD German Resource Ctr Genome Res, D-14059 Berlin, Germany
[6] MWG Biotech AG, D-85560 Ebersberg, Germany
关键词
D O I
10.1101/gr.1917404
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
We present an EST sequence assembler that specializes in reconstruction of pristine mRNA transcripts, while at the same time detecting and classifying single nucleotide polymorphisms (SNPs) occuring in different variations thereof. The assembler Uses iterative multipass strategies centered on high-confidence regions within sequences and has a fallback strategy for using low-confidence regions when needed. It features special functions to assemble high numbers of highly similar sequences without prior masking, an automatic editor that edits and analyzes alignments by inspecting the Underlying traces, and detection and classification of sequence properties like SNPs with a high specificity and a sensitivity down to one mutation per sequence. In addition, it includes possibilities to Use incorrectly preprocessed sequences, routines to make Use of additional sequencing information such as base-error probabilities, template insert sizes, strain information, etc., and functions to detect and resolve possible misassemblies. The assembler is routinely used for such various tasks as mutation detection in different cell types, similarity analysis of transcripts between organisms, and pristine assembly Of sequences from various sources for oligo design in clinical microarray experiments.
引用
收藏
页码:1147 / 1159
页数:13
相关论文
共 42 条
[1]
ALLEX CF, 1996, INTELL SYTEMS MOL BI, V4, P3
[2]
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[3]
A new approach to sequence comparison:: normalired sequence alignment [J].
Arslan, AN ;
Egecioglu, Ö ;
Pevzner, PA .
BIOINFORMATICS, 2001, 17 (04) :327-337
[4]
A NEW APPROACH TO TEXT SEARCHING [J].
BAEZAYATES, R ;
GONNET, GH .
COMMUNICATIONS OF THE ACM, 1992, 35 (10) :74-82
[5]
Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP [J].
Barker, G ;
Batley, J ;
O'Sullivan, H ;
Edwards, KJ ;
Edwards, D .
BIOINFORMATICS, 2003, 19 (03) :421-422
[6]
Automated detection of point mutations using fluorescent sequence trace subtraction [J].
Bonfield, JK ;
Rada, C ;
Staden, R .
NUCLEIC ACIDS RESEARCH, 1998, 26 (14) :3404-3409
[7]
A new DNA sequence assembly program [J].
Bonfield, JK ;
Smith, KF ;
Staden, R .
NUCLEIC ACIDS RESEARCH, 1995, 23 (24) :4992-4999
[8]
Experiment files and their application during large-scale sequencing projects [J].
Bonfield, JK ;
Staden, R .
DNA SEQUENCE, 1996, 6 (02) :109-117
[9]
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome [J].
Camargo, AA ;
Samaia, HPB ;
Dias-Neto, E ;
Simao, DF ;
Migotto, IA ;
Briones, MRS ;
Costa, FF ;
Nagai, MA ;
Verjovski-Almeida, S ;
Zago, MA ;
Andrade, LEC ;
Carrer, H ;
El-Dorry, HFA ;
Espreafico, EM ;
Habr-Gama, A ;
Giannella-Neto, D ;
Goldman, GH ;
Gruber, A ;
Hackel, C ;
Kimura, ET ;
Maciel, RMB ;
Marie, SKN ;
Martins, EAL ;
Nóbrega, MP ;
Paçó-Larson, ML ;
Pardini, MIMC ;
Pereira, GG ;
Pesquero, JB ;
Rodrigues, V ;
Rogatto, SR ;
da Silva, IDCG ;
Sogayar, MC ;
Sonati, MDF ;
Tajara, EH ;
Valentini, SR ;
Alberto, FL ;
Amaral, MEJ ;
Aneas, I ;
Arnaldi, LAT ;
de Assis, AM ;
Bengtson, MH ;
Bergamo, NA ;
Bombonato, V ;
de Camargo, MER ;
Canevari, RA ;
Carraro, DM ;
Cerutti, JM ;
Corrêa, MLC ;
Corrêa, RFR ;
Costa, MCR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (21) :12103-12108
[10]
A SURVEY OF MULTIPLE SEQUENCE COMPARISON METHODS [J].
CHAN, SC ;
WONG, AKC ;
CHIU, DKY .
BULLETIN OF MATHEMATICAL BIOLOGY, 1992, 54 (04) :563-598