Optimal reconstruction of a sequence from its probes

被引:25
作者
Frieze, AM [1 ]
Preparata, FP
Upfal, E
机构
[1] Carnegie Mellon Univ, Dept Math Sci, Pittsburgh, PA 15213 USA
[2] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
关键词
DNA sequencing; sequencing by hybridization; gapped probes; probabilistic analysis;
D O I
10.1089/106652799318328
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An important combinatorial problem, motivated by DNA sequencing in molecular biology, is the reconstruction of a sequence over a small finite alphabet from the collection of its probes (the sequence spectrum), obtained by sliding a fixed sampling pattern over the sequence. Such construction is required for Sequencing-by-Hybridization (SBH), a novel DNA sequencing technique based on an array (SBH chip) of short nucleotide sequences (probes), Once the sequence spectrum is biochemically obtained, a combinatorial method is used to reconstruct the DNA sequence from its spectrum, Since technology limits the number of probes on the SBH chip, a challenging combinatorial question is the design of a smallest set of probes that can sequence an arbitrary DNA string of a given length, We present in this work a novel probe design, crucially based on the use of universal bases [bases that bind to any nucleotide (Loakes and Brown, 1994)] that drastically improves the performance of the SBH process and asymptotically approaches the information-theoretic bound up to a constant factor. Furthermore, the sequencing algorithm we propose is substantially simpler than the Eulerian path method used in previous solutions of this problem.
引用
收藏
页码:361 / 368
页数:8
相关论文
共 11 条
  • [1] [Anonymous], [No title captured], DOI DOI 10.1007/3-540-58338-6_
  • [2] Poisson process approximation for sequence repeats, and sequencing by hybridization
    Arratia, R
    Martin, D
    Reinert, G
    Waterman, MS
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1996, 3 (03) : 425 - 463
  • [3] 2 MOMENTS SUFFICE FOR POISSON APPROXIMATIONS - THE CHEN-STEIN METHOD
    ARRATIA, R
    GOLDSTEIN, L
    GORDON, L
    [J]. ANNALS OF PROBABILITY, 1989, 17 (01) : 9 - 25
  • [4] A NOVEL METHOD FOR NUCLEIC-ACID SEQUENCE DETERMINATION
    BAINS, W
    SMITH, GC
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1988, 135 (03) : 303 - 307
  • [5] DRMANAC R, 1989, GENOMICS, V4, P114
  • [6] Dyer M, 1994, J Comput Biol, V1, P105, DOI 10.1089/cmb.1994.1.105
  • [7] 5-NITROINDOLE AS AN UNIVERSAL BASE ANALOG
    LOAKES, D
    BROWN, DM
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (20) : 4039 - 4043
  • [8] LYSOV YP, 1988, DOKL AKAD NAUK SSSR, V303, P1508
  • [9] 1-TUPLE DNA SEQUENCING - COMPUTER-ANALYSIS
    PEVZNER, PA
    [J]. JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1989, 7 (01) : 63 - 73
  • [10] IMPROVED CHIPS FOR SEQUENCING BY HYBRIDIZATION
    PEVZNER, PA
    LYSOV, YP
    KHRAPKO, KR
    BELYAVSKY, AV
    FLORENTIEV, VL
    MIRZABEKOV, AD
    [J]. JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1991, 9 (02) : 399 - 410