CNstream: A method for the identification and genotyping of copy number polymorphisms using Illumina microarrays

被引:14
作者
Alonso, Arnald [1 ]
Julia, Antonio [1 ]
Tortosa, Rauel [1 ]
Canaleta, Cristina [1 ]
Canete, Juan D. [2 ]
Ballina, Javier [3 ]
Balsa, Alejandro [4 ]
Tornero, Jesus [5 ]
Marsal, Sara [1 ]
机构
[1] Hosp Univ Vall Hebron UAB, Grp Recerca Reumatol, Inst Recerca, Barcelona, Spain
[2] Hosp Clin Barcelona, Barcelona, Spain
[3] Hosp Univ Cent Asturias, Oviedo, Asturias, Spain
[4] Hosp Univ La Paz, Madrid, Spain
[5] Hosp Univ Guadalajara, Castilla La Mancha, Spain
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
STRUCTURAL VARIATION; FINE-SCALE; GENOME; ASSOCIATION; SEGMENTATION; ALGORITHM;
D O I
10.1186/1471-2105-11-264
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Understanding the genetic basis of disease risk in depth requires an exhaustive knowledge of the types of genetic variation. Very recently, Copy Number Variants (CNVs) have received much attention because of their potential implication in common disease susceptibility. Copy Number Polymorphisms (CNPs) are of interest as they segregate at an appreciable frequency in the general population (i.e. > 1%) and are potentially implicated in the genetic basis of common diseases. Results: This paper concerns CNstream, a method for whole-genome CNV discovery and genotyping, using Illumina Beadchip arrays. Compared with other methods, a high level of accuracy was achieved by analyzing the measures of each intensity channel separately and combining information from multiple samples. The CNstream method uses heuristics and parametrical statistics to assign a confidence score to each sample at each probe; the sensitivity of the analysis is increased by jointly calling the copy number state over a set of nearby and consecutive probes. The present method has been tested on a real dataset of 575 samples genotyped using Illumina HumanHap 300 Beadchip, and demonstrates a high correlation with the Database of Genomic Variants (DGV). The same set of samples was analyzed with PennCNV, one of the most frequently used copy number inference methods for Illumina platforms. CNstream was able to identify CNP loci that are not detected by PennCNV and it increased the sensitivity over multiple other loci in the genome. Conclusions: CNstream is a useful method for the identification and characterization of CNPs using Illumina genotyping microarrays. Compared to the PennCNV method, it has greater sensitivity over multiple CNP loci and allows more powerful statistical analysis in these regions. Therefore, CNstream is a robust CNP analysis tool of use to researchers performing genome-wide association studies (GWAS) on Illumina platforms and aiming to identify CNVs associated with the variables of interest. CNstream has been implemented as an R statistical software package that can work directly from raw intensity files generated from Illumina GWAS projects. The method is available at http://www.urr.cat/cnv/cnstream.html.
引用
收藏
页数:18
相关论文
共 37 条
[1]  
BOVA GS, 1993, CANCER RES, V53, P3869
[2]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[3]  
COLELLA S, 2007, NUCL ACIDS RES
[4]   Systematic assessment of copy number variant detection via genome-wide SNP genotyping [J].
Cooper, Gregory M. ;
Zerr, Troy ;
Kidd, Jeffrey M. ;
Eichler, Evan E. ;
Nickerson, Deborah A. .
NATURE GENETICS, 2008, 40 (10) :1199-1203
[5]   Analysis of genome-wide copy number variation in Irish and Dutch ALS populations [J].
Cronin, Simon ;
Blauw, Hylke M. ;
Veldink, Jan H. ;
van Es, Michael A. ;
Ophoff, Roel A. ;
Bradley, Daniel G. ;
van den Berg, Leonard H. ;
Hardiman, Orla .
HUMAN MOLECULAR GENETICS, 2008, 17 (21) :3392-3398
[6]   Unsupervised segmentation of continuous genomic data [J].
Day, Nathan ;
Hemmaplardh, Andrew ;
Thurman, Robert E. ;
Stamatoyannopoulos, John A. ;
Noble, William S. .
BIOINFORMATICS, 2007, 23 (11) :1424-1426
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
DIETMAR P, 2005, CANCER, V104, P2417
[9]   Copy number variation at 1q21.1 associated with neuroblastoma [J].
Diskin, Sharon J. ;
Hou, Cuiping ;
Glessner, Joseph T. ;
Attiyeh, Edward F. ;
Laudenslager, Marci ;
Bosse, Kristopher ;
Cole, Kristina ;
Mosse, Yael P. ;
Wood, Andrew ;
Lynch, Jill E. ;
Pecor, Katlyn ;
Diamond, Maura ;
Winter, Cynthia ;
Wang, Kai ;
Kim, Cecilia ;
Geiger, Elizabeth A. ;
McGrady, Patrick W. ;
Blakemore, Alexandra I. F. ;
London, Wendy B. ;
Shaikh, Tamim H. ;
Bradfield, Jonathan ;
Grant, Struan F. A. ;
Li, Hongzhe ;
Devoto, Marcella ;
Rappaport, Eric R. ;
Hakonarson, Hakon ;
Maris, John M. .
NATURE, 2009, 459 (7249) :987-U112
[10]  
FRANK CA, 1988, ARTHRITIS RHEUM, V31, P315