RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins

被引:31
作者
Faux, Noel G.
Huttley, Gavin A.
Mahmood, Khalid
Webb, Geoffrey I.
de la Banda, Maria Garcia
Whisstock, James C.
机构
[1] Monash Univ, Dept Biochem & Mol Biol, Prot Crystallog Unit, Melbourne, Vic 3800, Australia
[2] Monash Univ, Victorian Bioinformat Consortium, Melbourne, Vic 3800, Australia
[3] Monash Univ, ARC Ctr Struct & Funct Microbial Genom, Melbourne, Vic 3800, Australia
[4] Australian Natl Univ, John Curtin Sch Med Res, Canberra, ACT 0200, Australia
[5] Monash Univ, Sch Comp Sci & Software Engn, Melbourne, Vic 3800, Australia
关键词
D O I
10.1101/gr.6255407
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats. med. monash. edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism's transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.
引用
收藏
页码:1118 / 1127
页数:10
相关论文
共 41 条
[1]   Comparative analysis of amino acid repeats in rodents and humans [J].
Albà, MM ;
Guigó, R .
GENOME RESEARCH, 2004, 14 (04) :549-554
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[4]   Cis-acting modifiers of expanded CAG CTG triplet repeat expandability:: associations with flanking GC content and proximity to CpG islands [J].
Brock, GJR ;
Anderson, NH ;
Monckton, DG .
HUMAN MOLECULAR GENETICS, 1999, 8 (06) :1061-1067
[5]   PyEvolve: a toolkit for statistical modelling of molecular evolution [J].
Butterfield, A ;
Vedagiri, V ;
Lang, E ;
Lawrence, C ;
Wakefield, MJ ;
Isaev, A ;
Huttley, GA .
BMC BIOINFORMATICS, 2004, 5 (1)
[6]   A genomic basis for the evolution of vertebrate transcription factors containing amino acid runs [J].
Caburet, S ;
Vaiman, D ;
Veitia, RA .
GENETICS, 2004, 167 (04) :1813-1820
[7]   The contribution of cis-elements to disease-associated repeat instability:: clinical and experimental evidence [J].
Cleary, JD ;
Pearson, CE .
CYTOGENETIC AND GENOME RESEARCH, 2003, 100 (1-4) :25-55
[8]  
Cocquet J, 2003, GENETICS, V165, P1613
[9]   RATES OF TRANSITION AND TRANSVERSION IN CODING SEQUENCES SINCE THE HUMAN-RODENT DIVERGENCE [J].
COLLINS, DW ;
JUKES, TH .
GENOMICS, 1994, 20 (03) :386-396
[10]   SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data [J].
Diehn, M ;
Sherlock, G ;
Binkley, G ;
Jin, H ;
Matese, JC ;
Hernandez-Boussard, T ;
Rees, CA ;
Cherry, JM ;
Botstein, D ;
Brown, PO ;
Alizadeh, AA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :219-223