DNA-SEQUENCE CONFIDENCE ESTIMATION

被引:16
作者
LIPSHUTZ, RJ
TAVERNER, F
HENNESSY, K
HARTZELL, C
DAVIS, R
机构
[1] DANIEL H WAGNER ASSOCIATES,SUNNYVALE,CA 94089
[2] APPL BIOSYST INC,FOSTER CITY,CA 94404
[3] UNIV CALIF BERKELEY,DEPT COMP SCI,BERKELEY,CA 94720
[4] STANFORD UNIV,DEPT GENET,STANFORD,CA 94305
关键词
D O I
10.1006/geno.1994.1089
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A significant bottleneck in the current DNA sequencing process is the manual editing of trace data generated by automated DNA sequencers. This step is used to correct base calls and to associate to each base call a confidence level. The confidence levels are used in the assembly process to determine overlaps and to resolve discrepancies in determining the consensus sequence. This single step may cost as much as 4 to 8 cents per finished base. We report an approach to automated trace editing using classification trees to detect and exploit context-based patterns in trace peak heights. Local base composition and nearby peak heights account for 80% of the variations in peak heights. Classification algorithms were developed to identify 37% of automated base calls that differ from the consensus sequence. With these algorithms, 12% of the base calls had confidence levels less than 90%. (C) 1994 Academic Press, Inc.
引用
收藏
页码:417 / 424
页数:8
相关论文
共 16 条
  • [1] ANSORGE W, 1987, NUCLEIC ACIDS RES, V15, P4602
  • [2] NEIGHBORING NUCLEOTIDE INTERACTIONS DURING DNA SEQUENCING GEL-ELECTROPHORESIS
    BOWLING, JM
    BRUNER, KL
    CMARIK, JL
    TIBBETTS, C
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 (11) : 3089 - 3097
  • [3] BOX GEP, 1978, STATISTICS EXPT
  • [4] BREIMAN L, 1986, CLASSIFICATION REGRE
  • [5] Chambers JM., 1992, STAT MODELS S WADSWO, P145
  • [6] THE ACCURACY OF DNA-SEQUENCES - ESTIMATING SEQUENCE QUALITY
    CHURCHILL, GA
    WATERMAN, MS
    [J]. GENOMICS, 1992, 14 (01) : 89 - 98
  • [7] A SEQUENCE ASSEMBLY AND EDITING PROGRAM FOR EFFICIENT MANAGEMENT OF LARGE PROJECTS
    DEAR, S
    STADEN, R
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 (14) : 3907 - 3911
  • [8] A SMALL COSMID FOR EFFICIENT CLONING OF LARGE DNA FRAGMENTS
    HOHN, B
    COLLINS, J
    [J]. GENE, 1980, 11 (3-4) : 291 - 298
  • [9] NEW METHOD FOR SEQUENCING DNA
    MAXAM, AM
    GILBERT, W
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1977, 74 (02) : 560 - 564
  • [10] A SYSTEM FOR RAPID DNA SEQUENCING WITH FLUORESCENT CHAIN-TERMINATING DIDEOXYNUCLEOTIDES
    PROBER, JM
    TRAINOR, GL
    DAM, RJ
    HOBBS, FW
    ROBERTSON, CW
    ZAGURSKY, RJ
    COCUZZA, AJ
    JENSEN, MA
    BAUMEISTER, K
    [J]. SCIENCE, 1987, 238 (4825) : 336 - 341