Predicting coding potential from genome sequence: Application to betaherpesviruses infecting rats and mice

被引:49
作者
Brocchieri, L [1 ]
Kledal, TN
Karlin, S
Mocarski, ES
机构
[1] Stanford Univ, Dept Math, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Microbiol & Immunol, Stanford, CA 94305 USA
关键词
D O I
10.1128/JVI.79.12.7570-7596.2005
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G + C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytornegallovirus biology and pathogenesis.
引用
收藏
页码:7570 / 7596
页数:27
相关论文
共 48 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2001, FIELDS VIROLOGY
[3]   Analysis and characterization of the complete genome of tupaia (tree shrew) herpesvirus [J].
Bahr, U ;
Darai, G .
JOURNAL OF VIROLOGY, 2001, 75 (10) :4854-4870
[4]   Proteins associated with purified human cytomegalovirus particles [J].
Baldick, CJ ;
Shenk, T .
JOURNAL OF VIROLOGY, 1996, 70 (09) :6097-6105
[5]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[6]   THE RELATIONSHIP BETWEEN BASE COMPOSITION AND CODON USAGE IN BACTERIAL GENES AND ITS USE FOR THE SIMPLE AND RELIABLE IDENTIFICATION OF PROTEIN-CODING SEQUENCES [J].
BIBB, MJ ;
FINDLAY, PR ;
JOHNSON, MW .
GENE, 1984, 30 (1-3) :157-166
[7]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[8]   A symmetric-iterated multiple alignment of protein sequences [J].
Brocchieri, L ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (01) :249-264
[9]   BIOLOGY OF RAT CYTOMEGALO-VIRUS INFECTION [J].
BRUGGEMAN, CA ;
MEIJER, H ;
BOSMAN, F ;
VANBOVEN, CPA .
INTERVIROLOGY, 1985, 24 (01) :1-9
[10]   Human cytomegalovirus clinical isolates carry at least 19 genes not found in laboratory strains [J].
Cha, TA ;
Tom, E ;
Kemble, GW ;
Duke, GM ;
Mocarski, ES ;
Spaete, RR .
JOURNAL OF VIROLOGY, 1996, 70 (01) :78-83