preAssemble: a tool for automatic sequencer trace data processing

被引:7
作者
Adzhubei, AA
Laerdahl, JK
Vlasova, AV
机构
[1] Norwegian Sch Vet Sci, NO-0033 Oslo, Norway
[2] Univ Oslo, Rikshosp, Ctr Mol Biol & Neurosci CMBN, NO-0027 Oslo, Norway
[3] VA Engelhardt Mol Biol Inst, Moscow 117984, Russia
[4] Univ Oslo, Ctr Biotechnol, NO-0317 Oslo, Norway
关键词
Pipelines;
D O I
10.1186/1471-2105-7-22
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Trace or chromatogram files ( raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence ( base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages - Phred and Staden are used by preAssemble to perform sequence quality processing. Results: The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram ( trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. Conclusion: preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.
引用
收藏
页数:5
相关论文
共 7 条
[1]  
ADZHUBEI AA, 2002, NORWEGIAN SALMON GEN
[2]  
BONFIELD J, 1995, STADEN PACKAGE
[3]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[4]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[5]   Automated finishing with Autofinish [J].
Gordon, D ;
Desmarais, C ;
Green, P .
GENOME RESEARCH, 2001, 11 (04) :614-625
[6]   ESTWeb: bioinformatics services for EST sequencing projects [J].
Paquola, ACM ;
Nishyiama, MY ;
Reis, EM ;
da Silva, AM ;
Verjovski-Almeida, S .
BIOINFORMATICS, 2003, 19 (12) :1587-1588
[7]  
Staden R, 2000, Methods Mol Biol, V132, P115