Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring

被引:147
作者
Arrigo, Nils [1 ]
Tuszynski, Jarek W. [2 ]
Ehrich, Dorothee [3 ]
Gerdes, Tommy [4 ]
Alvarez, Nadir [5 ]
机构
[1] Univ Neuchatel, Inst Biol, Lab Evolutionary Bot, CH-2000 Neuchatel, Switzerland
[2] Sci Applicat Int Corp, Mclean, VA 22102 USA
[3] Univ Tromso, Dept Biol, N-9037 Tromso, Norway
[4] Rigshosp, Dept Clin Genet, Chromosome Lab, Copenhagen, Denmark
[5] Univ Neuchatel, Inst Biol, Lab Evolutionary Entomol, CH-2000 Neuchatel, Switzerland
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
瑞士国家科学基金会;
关键词
GENOTYPING ERRORS; DIVERSITY; MARKERS; POLYMORPHISMS; HOMOPLASY; SOFTWARE; SIZE;
D O I
10.1186/1471-2105-10-33
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses. Results: Using a new scoring algorithm, RawGeno, we show that scoring errors-in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and " technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous)-induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (Ibin) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets. Conclusion: Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at http://sourceforge.net/ projects/rawgeno.
引用
收藏
页数:14
相关论文
共 25 条
[1]  
Avise John C., 2004, P1
[2]  
Benham J., 1999, J Agric Genome, V4, P3
[3]   Statistical analysis of amplified fragment length polymorphism data: a toolbox for molecular ecologists and evolutionists [J].
Bonin, A. ;
Ehrich, D. ;
Manel, S. .
MOLECULAR ECOLOGY, 2007, 16 (18) :3737-3758
[4]   How to track and assess genotyping errors in population genetics studies [J].
Bonin, A ;
Bellemain, E ;
Eidesen, PB ;
Pompanon, F ;
Brochmann, C ;
Taberlet, P .
MOLECULAR ECOLOGY, 2004, 13 (11) :3261-3273
[5]  
BOTSTEIN D, 1980, AM J HUM GENET, V32, P314
[6]   Impact of amplified fragment length polymorphism size homoplasy on the estimation of population genetic diversity and the detection of selective loci [J].
Caballero, Armando ;
Quesada, Humberto ;
Rolan-Alvarez, Emilio .
GENETICS, 2008, 179 (01) :539-554
[7]   Peakmatcher: Software for semi-automated fluorescence-based AFLP [J].
DeHaan, LR ;
Belina, RAK ;
Ehlke, NJ .
CROP SCIENCE, 2002, 42 (04) :1361-1364
[8]   AFLPDAT: a collection of R functions for convenient handling of AFLP data [J].
Ehrich, Dorothee .
MOLECULAR ECOLOGY NOTES, 2006, 6 (03) :603-604
[9]  
Gilder JR, 2004, J FORENSIC SCI, V49, P92
[10]   Relationships among levels of biodiversity and the relevance of intraspecific diversity in conservation - a project synopsis [J].
Gugerli, F. ;
Englisch, T. ;
Niklfeld, H. ;
Tribsch, A. ;
Mirek, Z. ;
Ronikier, M. ;
Zimmermann, N. E. ;
Holderegger, R. ;
Taberlet, P. .
PERSPECTIVES IN PLANT ECOLOGY EVOLUTION AND SYSTEMATICS, 2008, 10 (04) :259-281