ENGINES: exploring single nucleotide variation in entire human genomes

被引:27
作者
Amigo, Jorge [1 ,2 ]
Salas, Antonio [2 ]
Phillips, Christopher [2 ]
机构
[1] Univ Santiago de Compostela, Grp Med Xenom, CIBERER, Santiago De Compostela, Galicia, Spain
[2] Univ Santiago de Compostela, Unidade Xenet Forense, Inst Med Legal, Fac Med, Santiago De Compostela, Galicia, Spain
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
MAP;
D O I
10.1186/1471-2105-12-105
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. Description: We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen. Conclusions: ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php
引用
收藏
页数:6
相关论文
共 10 条
[1]   Interrogating a high-density SNP map for signatures of natural selection [J].
Akey, JM ;
Zhang, G ;
Zhang, K ;
Jin, L ;
Shriver, MD .
GENOME RESEARCH, 2002, 12 (12) :1805-1814
[2]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]  
AMIGO J, 2009, BMC BIOINFORMATIC S3, V10, P55
[5]   SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access [J].
Amigo, Jorge ;
Salas, Antonio ;
Phillips, Christopher ;
Carracedo, Angel .
BMC BIOINFORMATICS, 2008, 9 (1)
[6]   Arlequin (version 3.0): An integrated software package for population genetics data analysis [J].
Excoffier, Laurent ;
Laval, Guillaume ;
Schneider, Stefan .
EVOLUTIONARY BIOINFORMATICS, 2005, 1 :47-50
[7]  
LEWONTIN RC, 1973, GENETICS, V74, P175
[8]   Worldwide human relationships inferred from genome-wide patterns of variation [J].
Li, Jun Z. ;
Absher, Devin M. ;
Tang, Hua ;
Southwick, Audrey M. ;
Casto, Amanda M. ;
Ramachandran, Sohini ;
Cann, Howard M. ;
Barsh, Gregory S. ;
Feldman, Marcus ;
Cavalli-Sforza, Luigi L. ;
Myers, Richard M. .
SCIENCE, 2008, 319 (5866) :1100-1104
[9]   Perlegen Sciences, Inc. [J].
Peacock, E ;
Whitely, P .
PHARMACOGENOMICS, 2005, 6 (04) :439-442
[10]  
Pritchard JK, 2000, GENETICS, V155, P945