Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences

被引:198
作者
Makalowski, W
Zhang, JH
Boguski, MS
机构
[1] Natl. Ctr. for Biotech. Information, National Library of Medicine, National Institutes of Health, Bethesda
来源
GENOME RESEARCH | 1996年 / 6卷 / 09期
关键词
D O I
10.1101/gr.6.9.846
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A large set of mRNA and encoded protein sequences, from orthologous murine and human genes, was compiled to analyze statistical, biological, and evolutionary properties of coding and noncoding transcribed sequences. Protein sequence conservation varied between 36% and 100% identity, with an average value of 85%. The average degree of nucleotide sequence identity for the corresponding coding sequences was also similar to 85%, whereas 5' and 3' untranslated regions (UTRs) were less conserved, with aligned identities of 67% and 69%, respectively. For some mouse and human genes, nucleotide sequences are more highly conserved than the encoded protein sequences. A subset of 32 sequences, consisting of only mouse/human protein pairs for which the human sequence represents a positionally cloned disease gene, had properties very similar to the larger data set, suggesting that our data are representative of the genome as a whole. With respect to sequence conservation, two interesting outliers are the breast cancel (BRCA1) gene product and the testis-determining factor (SRY), both of which display among the lowest degrees of sequence identity. The occurrence of both introns and repetitive elements (e.g., Alu, B1) in 5' and 3' UTRs was also studied. These results provide one benchmark for the ''comparative genomics'' of mice and humans, with practical implications for the cross-referencing, of transcript maps. Also, they should prove useful in estimating the additional sampling diversity provided by mouse EST sequencing projects designed to complement the existing human cDNA collection.
引用
收藏
页码:846 / 857
页数:12
相关论文
共 45 条
[1]   COMPARATIVE GENOMICS, GENOME CROSS-REFERENCING AND XREFDB [J].
BASSET, DE ;
BOGUSKI, MS ;
SPENCER, F ;
REEVES, R ;
GOEBL, M ;
HIETER, P .
TRENDS IN GENETICS, 1995, 11 (09) :372-373
[2]   Yeast genes and human disease [J].
Bassett, DE ;
Boguski, MS ;
Hieter, P .
NATURE, 1996, 379 (6566) :589-590
[3]   EVIDENCE THAT THE SRY PROTEIN IS ENCODED BY A SINGLE EXON ON THE HUMAN Y-CHROMOSOME [J].
BEHLKE, MA ;
BOGAN, JS ;
BEERROMERO, P ;
PAGE, DC .
GENOMICS, 1993, 17 (03) :736-739
[4]   GenBank [J].
Benson, DA ;
Boguski, M ;
Lipman, DJ ;
Ostell, J .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :1-5
[5]   ESTABLISHING A HUMAN TRANSCRIPT MAP [J].
BOGUSKI, MS ;
SCHULER, GD .
NATURE GENETICS, 1995, 10 (04) :369-371
[6]  
CHAO KM, 1995, COMPUT APPL BIOSCI, V11, P147
[7]   POLYMORPHISM OF A CAG TRINUCLEOTIDE REPEAT WITHIN SRY CORRELATES WITH B6.Y-DOM SEX REVERSAL [J].
COWARD, P ;
NAGAI, K ;
CHEN, DG ;
THOMAS, HD ;
NAGAMINE, CM ;
LAU, YFC .
NATURE GENETICS, 1994, 6 (03) :245-250
[8]   SEQUENCE OF THE T COMPLEX TCP-10AT GENE AND EXAMINATION OF THE TCP-10T GENE FAMILY [J].
DAVIES, PO ;
WILLISON, KR .
MAMMALIAN GENOME, 1991, 1 (04) :235-241
[9]   Human/mouse homology relationships [J].
DeBry, RW ;
Seldin, MF .
GENOMICS, 1996, 33 (03) :337-351
[10]   DIVERSITY OF CYTOPLASMIC FUNCTIONS FOR THE 3' UNTRANSLATED REGION OF EUKARYOTIC TRANSCRIPTS [J].
DECKER, CJ ;
PARKER, P .
CURRENT OPINION IN CELL BIOLOGY, 1995, 7 (03) :386-392