PanCGH: a genotype-calling algorithm for pangenome CGH data

被引:16
作者
Bayjanov, Jumamurat R. [1 ]
Wels, Michiel [1 ,2 ,3 ]
Starrenburg, Marjo [2 ,4 ]
Vlieg, Johan E. T. van Hylckama [2 ,3 ,4 ]
Siezen, Roland J. [1 ,2 ,3 ,4 ]
Molenaar, Douwe [2 ,3 ,4 ]
机构
[1] Radboud Univ Nijmegen, Med Ctr, Nijmegen Ctr Mol Life, Ctr Mol & Biomol Informat, NL-6500 HB Nijmegen, Netherlands
[2] NIZO Food Res, NL-6710 BA Ede, Netherlands
[3] TI Food & Nutr, NL-6700 AN Wageningen, Netherlands
[4] Kluyver Ctr Genom Ind Fermentat, Delft, Netherlands
关键词
COMPARATIVE GENOMIC HYBRIDIZATION; LACTOBACILLUS-PLANTARUM; SPATIAL NORMALIZATION; DIVERSITY; IDENTIFICATION; MICROARRAYS; SEQUENCE; ADAPTATION; ORTHOLOGS; PARALOGS;
D O I
10.1093/bioinformatics/btn632
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Pangenome arrays contain DNA oligomers targeting several sequenced reference genomes from the same species. In microbiology, these can be employed to investigate the often high genetic variability within a species by comparative genome hybridization (CGH). The biological interpretation of pangenome CGH data depends on the ability to compare strains at a functional level, particularly by comparing the presence or absence of orthologous genes. Due to the high genetic variability, available genotype-calling algorithms can not be applied to pangenome CGH data. Results: We have developed the algorithm PanCGH that incorporates orthology information about genes to predict the presence or absence of orthologous genes in a query organism using CGH arrays that target the genomes of sequenced representatives of a group of microorganisms. PanCGH was tested and applied in the analysis of genetic diversity among 39 Lactococcus lactis strains from three different subspecies (lactis, cremoris, hordniae) and isolated from two different niches (dairy and plant). Clustering of these strains using the presence/absence data of gene orthologs revealed a clear separation between different subspecies and reflected the niche of the strains.
引用
收藏
页码:309 / 314
页数:6
相关论文
共 35 条
[1]  
[Anonymous], [No title captured]
[2]  
[Anonymous], 2007, R LANG ENV STAT COMP
[3]  
Cleveland W. S., 1992, STAT MODELS S, P312
[4]   Bacillus subtilis genome diversity [J].
Earl, Ashlee M. ;
Losick, Richard ;
Kolter, Roberto .
JOURNAL OF BACTERIOLOGY, 2007, 189 (03) :1163-1170
[5]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[6]   A generic approach to identify Transcription Factor-specific operator motifs;: Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1 [J].
Francke, Christof ;
Kerkhoven, Robert ;
Wels, Michiel ;
Siezen, Roland J. .
BMC GENOMICS, 2008, 9 (1)
[7]   Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray [J].
Fukiya, S ;
Mizoguchi, H ;
Tobe, T ;
Mori, H .
JOURNAL OF BACTERIOLOGY, 2004, 186 (12) :3911-3921
[8]   The structural basis of molecular adaptation [J].
Golding, GB ;
Dean, AM .
MOLECULAR BIOLOGY AND EVOLUTION, 1998, 15 (04) :355-369
[9]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[10]  
Hastie T., 2008, ELEMENTS STAT LEARNI, V2nd