AN EXPERIMENTALLY DERIVED DATA SET CONSTRUCTED FOR TESTING LARGE-SCALE DNA-SEQUENCE ASSEMBLY ALGORITHMS

被引:16
作者
SETO, D [1 ]
KOOP, BF [1 ]
HOOD, L [1 ]
机构
[1] CALTECH,DIV BIOL 14775,PASADENA,CA 91125
关键词
D O I
10.1006/geno.1993.1123
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A data set consisting of DNA sequences from a large-scale shotgun DNA cloning and sequencing project has been collected and posted for public release. The purpose is to propose a standard genomic DNA sequencing data set by which various algorithms and implementations can be tested. This set of data is divided into two subsets, one containing raw DNA sequence data (1023 clones) and the other consisting of the corresponding partially refined or edited DNA sequence data (820 clones). Suggested criteria or guidelines for this data refinement are presented so that algorithms for preprocessing and screening raw sequences may be developed. Development of such preprocessing, screening, aligning, and assembling algorithms will expedite large-scale DNA sequencing projects so that the complete unambiguous consensus DNA sequences will be made available to the general research community in a quicker manner. Smaller scale routine DNA sequencing projects will also be greatly aided by such computational efforts. © 1993 Academic Press, Inc.
引用
收藏
页码:673 / 676
页数:4
相关论文
共 8 条
[1]   GENBANK [J].
BURKS, C ;
CINKOSKY, MJ ;
FISCHER, WM ;
GILNA, P ;
HAYDEN, JED ;
KEEN, GM ;
KELLY, M ;
KRISTOFFERSON, D ;
LAWRENCE, J .
NUCLEIC ACIDS RESEARCH, 1992, 20 :2065-2069
[2]   RANDOM SUBCLONING OF SONICATED DNA - APPLICATION TO SHOTGUN DNA-SEQUENCE ANALYSIS [J].
DEININGER, PL .
ANALYTICAL BIOCHEMISTRY, 1983, 129 (01) :216-223
[3]   LARGE-SCALE AND AUTOMATED DNA-SEQUENCE DETERMINATION [J].
HUNKAPILLER, T ;
KAISER, RJ ;
KOOP, BF ;
HOOD, L .
SCIENCE, 1991, 254 (5028) :59-67
[4]   CLONING IN M13-PHAGE OR HOW TO USE BIOLOGY AT ITS BEST [J].
MESSING, J .
GENE, 1991, 100 :3-12
[5]  
SETO D, 1992, NUCLEIC ACIDS RES, V20, P1786
[8]  
YANNISCHPERRON C, 1985, GENE, V33, P103