Genotype harmonizer: Automatic strand alignment and format conversion for genotype data integration

被引:92
作者
Deelen P. [1 ,2 ]
Bonder M.J. [2 ]
Van Der Velde K.J. [1 ,2 ]
Westra H.-J. [2 ]
Winder E. [1 ,2 ]
Hendriksen D. [1 ,2 ]
Franke L. [2 ]
Swertz M.A. [1 ,2 ]
机构
[1] University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen
[2] University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen
关键词
GWAS; Imputation; Linkage disequilibrium; Meta-analysis;
D O I
10.1186/1756-0500-7-901
中图分类号
学科分类号
摘要
Background: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. Findings: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from www.molgenis.org/systemsgenetics. Conclusions: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines. © 2014 Deelen et al.; licensee BioMed Central.
引用
收藏
相关论文
共 14 条
[1]  
Evangelou E., Ioannidis J.P.A., Meta-analysis methods for genome-wide association studies and beyond, Nat Rev Genet, 14, pp. 379-389, (2013)
[2]  
Marchini J., Howie B., Genotype imputation for genome-wide association studies, Nat Rev Genet, 11, pp. 499-511, (2010)
[3]  
"TOP/BOT" Strand and "A/B" Allele
[4]  
Roshyara N., Kirsten H., Horn K., Ahnert P., Scholz M., Impact of pre-imputation SNP-filtering on genotype imputation results, BMC Genet, 15, (2014)
[5]  
Howie B., Donnelly P., Marchini J., A flexible and accurate genotype imputation method for the next generation of genome-wide association studies, PLoS Genet, 5, (2009)
[6]  
Willer C.J., Li Y., Abecasis G.R., METAL: Fast and efficient meta-analysis of genomewide association scans, Bioinformatics, 26, pp. 2190-2191, (2010)
[7]  
Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., De Bakker P.I.W., Daly M.J., Sham P.C., PLINK: A tool set for whole-genome association and population-based linkage analyses, Am J Hum Genet, 81, pp. 559-575, (2007)
[8]  
Delaneau O., Zagury J.-F., Marchini J., Improved whole-chromosome phasing for disease and population genetic studies, Nat Genet, 10, pp. 5-6, (2013)
[9]  
The Genome of The Netherlands Consortium, Whole-genome sequence variation, population structure and demographic history of the Dutch population, Nat Genet, 46, pp. 818-825, (2014)
[10]  
Swertz M.A., Dijkstra M., Adamusiak T., Van Der Velde J.K., Kanterakis A., Roos E.T., Lops J., Thorisson G.A., Arends D., Byelas G., Muilu J., Brookes A.J., De Brock E.O., Jansen R.C., Parkinson H., The MOLGENIS toolkit: Rapid prototyping of biosoftware at the push of a button, BMC Bioinformatics, 11, (2010)