THE ACCURACY OF DNA-SEQUENCES - ESTIMATING SEQUENCE QUALITY

被引:46
作者
CHURCHILL, GA [1 ]
WATERMAN, MS [1 ]
机构
[1] UNIV SO CALIF, DEPT MATH & MOLEC BIOL, LOS ANGELES, CA 90089 USA
关键词
D O I
10.1016/S0888-7543(05)80288-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
In this paper we describe a method for the statistical reconstruction of a large DNA sequence from a set of sequenced fragments. We assume that the fragments have been assembled and address the problem of determining the degree to which the reconstructed sequence is free from errors, i.e., its accuracy. A consensus distribution is derived from the assembled fragment configuration based upon the rates of sequencing errors in the individual fragments. The consensus distribution can be used to find a minimally redundant consensus sequence that meets a prespecified confidence level, either base by base or across any region of the sequence. A likelihood-based procedure for the estimation of the sequencing error rates, which utilizes an iterative EM algorithm, is described. Prior knowledge of the error rates is easily incorporated into the estimation procedure. The methods are applied to a set of assembled sequence fragments from the human G6PD locus. We close the paper with a brief discussion of the relevance and practical implications of this work. © 1992 Academic Press, Inc. All rights reserved.
引用
收藏
页码:89 / 98
页数:10
相关论文
共 25 条
  • [1] [Anonymous], 1968, INTRO PROBABILITY TH
  • [2] ARRATIA R, 1989, B MATH BIOL, V51, P125, DOI 10.1016/S0092-8240(89)80052-7
  • [3] STATISTICAL-ANALYSIS OF DNA FINGERPRINT DATA FOR ORDERED CLONE PHYSICAL MAPPING OF HUMAN-CHROMOSOMES
    BALDING, DJ
    TORNEY, DC
    [J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1991, 53 (06) : 853 - 879
  • [4] OPTIMIZING RESTRICTION FRAGMENT FINGERPRINTING METHODS FOR ORDERING LARGE GENOMIC LIBRARIES
    BRANSCOMB, E
    SLEZAK, T
    PAE, R
    GALAS, D
    CARRANO, AV
    WATERMAN, M
    [J]. GENOMICS, 1990, 8 (02) : 351 - 366
  • [5] BURKS C, 1990, METHOD ENZYMOL, V183, P3
  • [6] SEQUENCE OF HUMAN GLUCOSE-6-PHOSPHATE-DEHYDROGENASE CLONED IN PLASMIDS AND A YEAST ARTIFICIAL CHROMOSOME
    CHEN, EY
    CHENG, A
    LEE, A
    KUANG, WJ
    HILLIER, L
    GREEN, P
    SCHLESSINGER, D
    CICCODICOLA, A
    DURSO, M
    [J]. GENOMICS, 1991, 10 (03) : 792 - 800
  • [7] CHURCHILL GA, 1989, B MATH BIOL, V51, P79
  • [8] CHURCHILL GA, 1990, UNPUB FRAGMENT ASSEM
  • [9] NOMENCLATURE FOR INCOMPLETELY SPECIFIED BASES IN NUCLEIC-ACID SEQUENCES - RECOMMENDATIONS 1984
    CORNISHBOWDEN, A
    [J]. NUCLEIC ACIDS RESEARCH, 1985, 13 (09) : 3021 - 3030
  • [10] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38