DNA-SEQUENCE CONFIDENCE ESTIMATION

被引:16
作者
LIPSHUTZ, RJ
TAVERNER, F
HENNESSY, K
HARTZELL, C
DAVIS, R
机构
[1] DANIEL H WAGNER ASSOCIATES,SUNNYVALE,CA 94089
[2] APPL BIOSYST INC,FOSTER CITY,CA 94404
[3] UNIV CALIF BERKELEY,DEPT COMP SCI,BERKELEY,CA 94720
[4] STANFORD UNIV,DEPT GENET,STANFORD,CA 94305
关键词
D O I
10.1006/geno.1994.1089
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A significant bottleneck in the current DNA sequencing process is the manual editing of trace data generated by automated DNA sequencers. This step is used to correct base calls and to associate to each base call a confidence level. The confidence levels are used in the assembly process to determine overlaps and to resolve discrepancies in determining the consensus sequence. This single step may cost as much as 4 to 8 cents per finished base. We report an approach to automated trace editing using classification trees to detect and exploit context-based patterns in trace peak heights. Local base composition and nearby peak heights account for 80% of the variations in peak heights. Classification algorithms were developed to identify 37% of automated base calls that differ from the consensus sequence. With these algorithms, 12% of the base calls had confidence levels less than 90%. (C) 1994 Academic Press, Inc.
引用
收藏
页码:417 / 424
页数:8
相关论文
共 16 条
  • [11] A SEQUENCING REALITY CHECK
    ROBERTS, L
    [J]. SCIENCE, 1988, 242 (4883) : 1245 - 1245
  • [12] DNA SEQUENCING WITH CHAIN-TERMINATING INHIBITORS
    SANGER, F
    NICKLEN, S
    COULSON, AR
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1977, 74 (12) : 5463 - 5467
  • [13] FLUORESCENCE DETECTION IN AUTOMATED DNA-SEQUENCE ANALYSIS
    SMITH, LM
    SANDERS, JZ
    KAISER, RJ
    HUGHES, P
    DODD, C
    CONNELL, CR
    HEINER, C
    KENT, SBH
    HOOD, LE
    [J]. NATURE, 1986, 321 (6071) : 674 - 679
  • [14] TIBBETTS C, 1992, COMMUNICATION
  • [15] WALKER M, 1992, 9201 STANF U TECHN R
  • [16] 1991, CYCLE SEQUENCING DNA