Experiment files and their application during large-scale sequencing projects

被引:72
作者
Bonfield, JK [1 ]
Staden, R [1 ]
机构
[1] MRC,MOLEC BIOL LAB,CAMBRIDGE CB2 2QH,ENGLAND
来源
DNA SEQUENCE | 1996年 / 6卷 / 02期
基金
英国医学研究理事会;
关键词
data processing; DNA sequencing; experiment files; file format; sequence assembly;
D O I
10.3109/10425179609010197
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The data for large scale sequencing projects are passed through several processing steps prior to assembly, and post-assembly processing generally requires knowledge of more than just the sequence of each reading. We address here the problem of providing data to individual programs and of combining all the tasks into a single process. The solution comprises two components: a file format (experiment file format) that stores information about readings, and a script (PREGAP) that controls the creation and use of experiment files by the processing programs. PREGAP can take a batch of data from a variety of sequencing instruments, gather information about each reading, and then scan the reading to select the 3' end of the good quality data, mark sequencing vector, other cloning vector sequences, and Alu segments. The results of all these operations are added to the experiment file for each reading, ready for processing by the assembly program. Experiment files also provide a mechanism for using alternative assembly engines with our package.
引用
收藏
页码:109 / 117
页数:9
相关论文
共 14 条
[1]   THE APPLICATION OF NUMERICAL ESTIMATES OF BASE CALLING ACCURACY TO DNA-SEQUENCING PROJECTS [J].
BONFIELD, JK ;
STADEN, R .
NUCLEIC ACIDS RESEARCH, 1995, 23 (08) :1406-1410
[2]  
BURKS C, 1994, AUTOMATED DNA SEQUEN
[3]  
Dear S, 1992, DNA Seq, V3, P107, DOI 10.3109/10425179209034003
[4]   A SEQUENCE ASSEMBLY AND EDITING PROGRAM FOR EFFICIENT MANAGEMENT OF LARGE PROJECTS [J].
DEAR, S ;
STADEN, R .
NUCLEIC ACIDS RESEARCH, 1991, 19 (14) :3907-3911
[5]  
GLEIZES A, 1994, COMPUT APPL BIOSCI, V10, P401
[6]   A CONTIG ASSEMBLY PROGRAM BASED ON SENSITIVE DETECTION OF FRAGMENT OVERLAPS [J].
HUANG, XQ .
GENOMICS, 1992, 14 (01) :18-25
[7]   PROTOTYPIC SEQUENCES FOR HUMAN REPETITIVE DNA [J].
JURKA, J ;
WALICHIEWICZ, J ;
MILOSAVLJEVIC, A .
JOURNAL OF MOLECULAR EVOLUTION, 1992, 35 (04) :286-291
[8]   THE GENOME RECONSTRUCTION MANAGER - A SOFTWARE ENVIRONMENT FOR SUPPORTING HIGH-THROUGHPUT DNA-SEQUENCING [J].
LAWRENCE, CB ;
HONDA, S ;
PARROTT, NW ;
FLOOD, TC ;
GU, LH ;
ZHANG, L ;
JAIN, M ;
LARSON, S ;
MYERS, EW .
GENOMICS, 1994, 23 (01) :192-201
[9]  
Pearson W R, 1994, Methods Mol Biol, V25, P365
[10]  
PETOLA H, 1984, NUCLEIC ACIDS RES, V12, P307