Sequence-based estimation of minisatellite and microsatellite repeat variability

被引:151
作者
Legendre, Matthieu
Pochet, Nathalie
Pak, Theodore
Verstrepen, Kevin J. [1 ]
机构
[1] Harvard Univ, FAS Ctr Syst Biol, Cambridge, MA 02138 USA
[2] MIT, Broad Inst Harvard, Cambridge, MA 02139 USA
[3] Katholieke Univ Leuven, Ctr Microbial & Plant Genet, Dept Mol & Microbial Syst, Fac Appl Biosci & Engn, B-3001 Heverlee, Belgium
关键词
D O I
10.1101/gr.6554007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Variable tandem repeats are frequently used for genetic mapping, genotyping, and forensics studies. Moreover, variation in some repeats underlies rapidly evolving traits or certain diseases. However, mutation rates vary greatly from repeat to repeat, and as a consequence, not all tandem repeats are suitable genetic markers or interesting unstable genetic modules. We developed a model, "SERV," that predicts the variability of a broad range of tandem repeats in a wide range of organisms. The nonlinear model uses three basic characteristics of the repeat ( number of repeated units, unit length, and purity) to produce a numeric " VARscore" that correlates with repeat variability. SERV was experimentally validated using a large set of different artificial repeats located in the Saccharomyces cerevisiae URA3 gene. Further in silico analysis shows that SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. Using SERV, we demonstrate significant enrichment of variable repeats within human genes involved in transcriptional regulation, chromatin remodeling, morphogenesis, and neurogenesis. Moreover, SERV allows identification of known and candidate genes involved in repeat-based diseases. In addition, we demonstrate the use of SERV for the selection and comparison of suitable variable repeats for genotyping and forensic purposes. Our analysis indicates that tandem repeats used for genotyping should have a VARscore between 1 and 3. SERV is publicly available from http://hulsweb1.cgr.harvard.edu/ SERV/.
引用
收藏
页码:1787 / 1796
页数:10
相关论文
共 47 条
[1]   BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments [J].
Al-Shahrour, F ;
Minguez, P ;
Vaquerizas, JM ;
Conde, L ;
Dopazo, J .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W460-W464
[2]  
[Anonymous], 2002, Least Squares Support Vector Machines
[3]   MUC1 and the MUCs: A family of human mucins with impact in cancer biology [J].
Baldus, SE ;
Engelmann, K ;
Hanisch, FG .
CRITICAL REVIEWS IN CLINICAL LABORATORY SCIENCES, 2004, 41 (02) :189-231
[4]  
Becker KG, 2004, NAT GENET, V36, P431, DOI 10.1038/ng0504-431
[5]  
BENJAMINI Y, 1995, J ROY STAT SOC B, V57, P963
[6]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[7]   Human preocular mucins reflect changes in surface physiology [J].
Berry, M ;
Ellingham, RB ;
Corfield, AP .
BRITISH JOURNAL OF OPHTHALMOLOGY, 2004, 88 (03) :377-383
[8]   Patterns of polymorphism and divergence in stress-related yeast proteins [J].
Bowen, S ;
Roberts, C ;
Wheals, AE .
YEAST, 2005, 22 (08) :659-668
[9]  
Brachmann CB, 1998, YEAST, V14, P115
[10]   Genetics and genomics of core short tandem repeat loci used in human identity testing [J].
Butler, JM .
JOURNAL OF FORENSIC SCIENCES, 2006, 51 (02) :253-265