Modeling sequencing errors by combining Hidden Markov models

被引:104
作者
Lottaz, C.
Iseli, C. [1 ]
Jongeneel, C. V. [1 ]
Bucher, P.
机构
[1] Ludwig Inst Canc Res, Off Informat Technol, CH-1066 Epalinges, S Lausanne, Switzerland
关键词
coding region prediction; sequencing errors; expressed sequence tags; hidden Markov models;
D O I
10.1093/bioinformatics/btg1067
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
引用
收藏
页码:II103 / II112
页数:10
相关论文
共 23 条
  • [1] COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT
    ADAMS, MD
    KELLEY, JM
    GOCAYNE, JD
    DUBNICK, M
    POLYMEROPOULOS, MH
    XIAO, H
    MERRIL, CR
    WU, A
    OLDE, B
    MORENO, RF
    KERLAVAGE, AR
    MCCOMBIE, WR
    VENTER, JC
    [J]. SCIENCE, 1991, 252 (5013) : 1651 - 1656
  • [2] GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS
    BORODOVSKY, M
    MCININCH, J
    [J]. COMPUTERS & CHEMISTRY, 1993, 17 (02): : 123 - 133
  • [3] PROSET - A FAST PROCEDURE TO CREATE NONREDUNDANT SETS OF PROTEIN SEQUENCES
    BRENDEL, V
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 1992, 16 (6-7) : 37 - 43
  • [4] Frame: detection of genomic sequencing errors
    Brown, NP
    Sander, C
    Bork, P
    [J]. BIOINFORMATICS, 1998, 14 (04) : 367 - 371
  • [5] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [6] The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome
    Camargo, AA
    Samaia, HPB
    Dias-Neto, E
    Simao, DF
    Migotto, IA
    Briones, MRS
    Costa, FF
    Nagai, MA
    Verjovski-Almeida, S
    Zago, MA
    Andrade, LEC
    Carrer, H
    El-Dorry, HFA
    Espreafico, EM
    Habr-Gama, A
    Giannella-Neto, D
    Goldman, GH
    Gruber, A
    Hackel, C
    Kimura, ET
    Maciel, RMB
    Marie, SKN
    Martins, EAL
    Nóbrega, MP
    Paçó-Larson, ML
    Pardini, MIMC
    Pereira, GG
    Pesquero, JB
    Rodrigues, V
    Rogatto, SR
    da Silva, IDCG
    Sogayar, MC
    Sonati, MDF
    Tajara, EH
    Valentini, SR
    Alberto, FL
    Amaral, MEJ
    Aneas, I
    Arnaldi, LAT
    de Assis, AM
    Bengtson, MH
    Bergamo, NA
    Bombonato, V
    de Camargo, MER
    Canevari, RA
    Carraro, DM
    Cerutti, JM
    Corrêa, MLC
    Corrêa, RFR
    Costa, MCR
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (21) : 12103 - 12108
  • [7] FICHANT GA, 1995, NUCLEIC ACIDS RES, V23, P2900
  • [8] Finding genes by computer: The state of the art
    Fickett, JW
    [J]. TRENDS IN GENETICS, 1996, 12 (08) : 316 - 320
  • [9] Guan XJ, 1996, COMPUT APPL BIOSCI, V12, P31
  • [10] Translation initiation start prediction in human cDNAs with high accuracy
    Hatzigeorgiou, AG
    [J]. BIOINFORMATICS, 2002, 18 (02) : 343 - 350