Comparative Methods for Association Studies: A Case Study on Metabolite Variation in a Brassica rapa Core Collection

被引:31
作者
Del Carpio, Dunia Pino [1 ]
Basnet, Ram Kumar [1 ,3 ]
De Vos, Ric C. H. [2 ,3 ]
Maliepaard, Chris [1 ]
Paulo, Maria Joao [2 ]
Bonnema, Guusje [1 ,3 ]
机构
[1] Wageningen Univ, Lab Plant Breeding, Wageningen, Netherlands
[2] Univ Wageningen & Res Ctr, Wageningen, Netherlands
[3] Ctr BioSyst Genom, Wageningen, Netherlands
关键词
MULTILOCUS GENOTYPE DATA; POPULATION-STRUCTURE; RANDOM FORESTS; LINKAGE DISEQUILIBRIUM; INFERENCE; RESISTANCE; TRAITS; CLASSIFICATION; COMPONENTS; LOCI;
D O I
10.1371/journal.pone.0019624
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Background: Association mapping is a statistical approach combining phenotypic traits and genetic diversity in natural populations with the goal of correlating the variation present at phenotypic and allelic levels. It is essential to separate the true effect of genetic variation from other confounding factors, such as adaptation to different uses and geographical locations. The rapid availability of large datasets makes it necessary to explore statistical methods that can be computationally less intensive and more flexible for data exploration. Methodology/Principal Findings: A core collection of 168 Brassica rapa accessions of different morphotypes and origins was explored to find genetic association between markers and metabolites: tocopherols, carotenoids, chlorophylls and folate. A widely used linear model with modifications to account for population structure and kinship was followed for association mapping. In addition, a machine learning algorithm called Random Forest (RF) was used as a comparison. Comparison of results across methods resulted in the selection of a set of significant markers as promising candidates for further work. This set of markers associated to the metabolites can potentially be applied for the selection of genotypes with elevated levels of these metabolites. Conclusions/Significance: The incorporation of the kinship correction into the association model did not reduce the number of significantly associated markers. However incorporation of the STRUCTURE correction (Q matrix) in the linear regression model greatly reduced the number of significantly associated markers. Additionally, our results demonstrate that RF is an interesting complementary method with added value in association studies in plants, which is illustrated by the overlap in markers identified using RF and a linear mixed model with correction for kinship and population structure. Several markers that were selected in RF and in the models with correction for kinship, but not for population structure, were also identified as QTLs in two bi-parental DH populations.
引用
收藏
页数:10
相关论文
共 46 条
[1]
Association mapping of yield and its components in rice cultivars [J].
Agrama, H. A. ;
Eizenga, G. C. ;
Yan, W. .
MOLECULAR BREEDING, 2007, 19 (04) :341-356
[2]
Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes [J].
Aranzana, MJ ;
Kim, S ;
Zhao, KY ;
Bakker, E ;
Horton, M ;
Jakob, K ;
Lister, C ;
Molitor, J ;
Shindo, C ;
Tang, CL ;
Toomajian, C ;
Traw, B ;
Zheng, HG ;
Bergelson, J ;
Dean, C ;
Marjoram, P ;
Nordborg, M .
PLOS GENETICS, 2005, 1 (05) :531-539
[3]
Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines [J].
Atwell, Susanna ;
Huang, Yu S. ;
Vilhjalmsson, Bjarni J. ;
Willems, Glenda ;
Horton, Matthew ;
Li, Yan ;
Meng, Dazhe ;
Platt, Alexander ;
Tarone, Aaron M. ;
Hu, Tina T. ;
Jiang, Rong ;
Muliyati, N. Wayan ;
Zhang, Xu ;
Amer, Muhammad Ali ;
Baxter, Ivan ;
Brachi, Benjamin ;
Chory, Joanne ;
Dean, Caroline ;
Debieu, Marilyne ;
de Meaux, Juliette ;
Ecker, Joseph R. ;
Faure, Nathalie ;
Kniskern, Joel M. ;
Jones, Jonathan D. G. ;
Michael, Todd ;
Nemri, Adnane ;
Roux, Fabrice ;
Salt, David E. ;
Tang, Chunlao ;
Todesco, Marco ;
Traw, M. Brian ;
Weigel, Detlef ;
Marjoram, Paul ;
Borevitz, Justin O. ;
Bergelson, Joy ;
Nordborg, Magnus .
NATURE, 2010, 465 (7298) :627-631
[4]
Batagelj V, 2004, MATH VIS, P77
[5]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]
The light-hyperresponsive high pigment-2dg mutation of tomato:: alterations in the fruit metabolome [J].
Bino, RJ ;
de Vos, CHR ;
Lieberman, M ;
Hall, RD ;
Bovy, A ;
Jonker, HH ;
Tikunov, Y ;
Lommen, A ;
Moco, S ;
Levin, I .
NEW PHYTOLOGIST, 2005, 166 (02) :427-438
[7]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]
BREIMAN L, 2001, BREIMAN CUTLERS REAN
[9]
Association analysis as a strategy for improvement of quantitative traits in plants [J].
Breseghello, F ;
Sorrells, ME .
CROP SCIENCE, 2006, 46 (03) :1323-1330
[10]
A forest-based approach to identifying gene and gene-gene interactions [J].
Chen, Xiang ;
Liu, Ching-Ti ;
Zhang, Meizhuo ;
Zhang, Heping .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) :19199-19203