Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis

被引:26
作者
Mignone, F
Grillo, G
Liuni, S
Pesole, G
机构
[1] Univ Milan, Dipartimento Fisiol & Biochim Gen, I-20133 Milan, Italy
[2] CNR, Sez Bioinformat & Genom, ITB, I-70125 Bari, Italy
关键词
D O I
10.1093/nar/gkg483
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human-mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to establish if detected conserved sequences are genes or not. We propose here a simple algorithm that, based on the observation of the specific evolutionary dynamics of coding sequences, efficiently discriminates between coding and non-coding CSTs. The application of this method may help the validation of predicted genes, the prediction of alternative splicing patterns in known and unknown genes and the definition of a dictionary of non-coding regulatory elements.
引用
收藏
页码:4639 / 4645
页数:7
相关论文
共 20 条
  • [1] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [2] CRITICA: Coding region identification tool invoking comparative analysis
    Badger, JH
    Olsen, GJ
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) : 512 - 524
  • [3] GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
    Besemer, J
    Lomsadze, A
    Borodovsky, M
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (12) : 2607 - 2618
  • [4] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [5] Reevaluating human gene annotation: A second-generation analysis of chromosome 22
    Collins, JE
    Goward, ME
    Cole, CG
    Smink, LJ
    Huckle, EJ
    Knowles, S
    Bye, JM
    Beare, DM
    Dunham, I
    [J]. GENOME RESEARCH, 2003, 13 (01) : 27 - 36
  • [6] Improved microbial gene identification with GLIMMER
    Delcher, AL
    Harmon, D
    Kasif, S
    White, O
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (23) : 4636 - 4641
  • [7] Numerous potentially functional but non-genic conserved sequences on human chromosome 21
    Dermitzakis, ET
    Reymond, A
    Lyle, R
    Scamuffa, N
    Ucla, C
    Deutsch, S
    Stevenson, BJ
    Flegel, V
    Bucher, P
    Jongeneel, CV
    Antonarakis, SE
    [J]. NATURE, 2002, 420 (6915) : 578 - 582
  • [8] Active conservation of noncoding sequences revealed by three-way species comparisons
    Dubchak, I
    Brudno, M
    Loots, GG
    Pachter, L
    Mayor, C
    Rubin, EM
    Frazer, KA
    [J]. GENOME RESEARCH, 2000, 10 (09) : 1304 - 1306
  • [9] The Ensembl genome database project
    Hubbard, T
    Barker, D
    Birney, E
    Cameron, G
    Chen, Y
    Clark, L
    Cox, T
    Cuff, J
    Curwen, V
    Down, T
    Durbin, R
    Eyras, E
    Gilbert, J
    Hammond, M
    Huminiecki, L
    Kasprzyk, A
    Lehvaslaiho, H
    Lijnzaad, P
    Melsopp, C
    Mongin, E
    Pettett, R
    Pocock, M
    Potter, S
    Rust, A
    Schmidt, E
    Searle, S
    Slater, G
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Stupka, E
    Ureta-Vidal, A
    Vastrik, I
    Clamp, M
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 38 - 41
  • [10] Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs
    Jareborg, N
    Birney, E
    Durbin, R
    [J]. GENOME RESEARCH, 1999, 9 (09) : 815 - 824