CAP3: A DNA sequence assembly program

被引:4104
作者
Huang, XQ [1 ]
Madan, A
机构
[1] Michigan Technol Univ, Dept Comp Sci, Houghton, MI 49931 USA
[2] Univ Washington, Sch Med, Dept Mol Biotechnol, Seattle, WA 98195 USA
关键词
D O I
10.1101/gr.9.9.868
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe the third generation of the CAP sequence assembly program. The CAP3 program includes a number of improvements and new Features. The program has a capability to clip 5' and 3' low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward-reverse constraints to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with Forward-reverse constraints.
引用
收藏
页码:868 / 877
页数:10
相关论文
共 22 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   A new DNA sequence assembly program [J].
Bonfield, JK ;
Smith, KF ;
Staden, R .
NUCLEIC ACIDS RESEARCH, 1995, 23 (24) :4992-4999
[3]  
CHAO KM, 1992, COMPUT APPL BIOSCI, V8, P481
[4]   ARTIFICIALLY GENERATED DATA SETS FOR TESTING DNA-SEQUENCE ASSEMBLY ALGORITHMS [J].
ENGLE, ML ;
BURKS, C .
GENOMICS, 1993, 16 (01) :286-288
[5]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]  
GLEIZES A, 1994, COMPUT APPL BIOSCI, V10, P401
[8]   Consed: A graphical tool for sequence finishing [J].
Gordon, D ;
Abajian, C ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :195-202
[9]   LINEAR SPACE ALGORITHM FOR COMPUTING MAXIMAL COMMON SUBSEQUENCES [J].
HIRSCHBERG, DS .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :341-343
[10]  
Huang X, 1996, Microb Comp Genomics, V1, P281