Building the sequence map of the human pan-genome

被引:179
作者
Li, Ruiqiang [1 ]
Li, Yingrui [1 ]
Zheng, Hancheng [1 ,3 ]
Luo, Ruibang [1 ,3 ]
Zhu, Hongmei [1 ]
Li, Qibin [1 ]
Qian, Wubin [1 ]
Ren, Yuanyuan [1 ]
Tian, Geng [1 ]
Li, Jinxiang [1 ]
Zhou, Guangyu [1 ]
Zhu, Xuan [1 ]
Wu, Honglong [1 ,6 ]
Qin, Junjie [1 ]
Jin, Xin [1 ,3 ]
Li, Dongfang [1 ,6 ]
Cao, Hongzhi [1 ,6 ]
Hu, Xueda [1 ]
Blanche, Helene [4 ]
Cann, Howard [4 ]
Zhang, Xiuqing [1 ]
Li, Songgang [1 ]
Bolund, Lars [1 ,5 ]
Kristiansen, Karsten [1 ,2 ]
Yang, Huanming [1 ]
Wang, Jun [1 ,2 ]
Wang, Jian [1 ]
机构
[1] BGI Shenzhen, Shenzhen 518083, Peoples R China
[2] Univ Copenhagen, Dept Biol, Copenhagen, Denmark
[3] S China Univ Technol, Sch Biosci & Biotechnol, Guangzhou, Guangdong, Peoples R China
[4] CEPH, Fdn Jean Dausset, Paris, France
[5] Univ Aarhus, Inst Human Genet, Aarhus, Denmark
[6] Shenzhen Univ, Sch Med, Genome Res Inst, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
MULTILOCUS GENOTYPE DATA; COPY-NUMBER VARIATION; POPULATION-STRUCTURE; GENETIC-STRUCTURE; DNA; CLASSIFICATION; INFERENCE; PATTERNS;
D O I
10.1038/nbt.1596
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified similar to 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present in patterns consistent with known human migration paths. Cross-species conservation analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain similar to 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly.
引用
收藏
页码:57 / U83
页数:7
相关论文
共 37 条
[1]   The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group [J].
Ahn, Sung-Min ;
Kim, Tae-Hyung ;
Lee, Sunghoon ;
Kim, Deokhoon ;
Ghang, Ho ;
Kim, Dae-Soo ;
Kim, Byoung-Chul ;
Kim, Sang-Yoon ;
Kim, Woo-Yeon ;
Kim, Chulhong ;
Park, Daeui ;
Lee, Yong Seok ;
Kim, Sangsoo ;
Reja, Rohit ;
Jho, Sungwoong ;
Kim, Chang Geun ;
Cha, Ji-Young ;
Kim, Kyung-Hee ;
Lee, Bonghee ;
Bhak, Jong ;
Kim, Seong-Jin .
GENOME RESEARCH, 2009, 19 (09) :1622-1629
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[4]   Active genes in junk DNA?: Characterization of DUX genes embedded within 3.3 kb repeated elements [J].
Beckers, MC ;
Gabriëls, J ;
van der Maarel, S ;
De Vriese, A ;
Frants, RR ;
Collen, D ;
Belayew, A .
GENE, 2001, 264 (01) :51-57
[5]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]   Closing gaps in the human genome with fosmid resources generated from multiple individuals (Reprinted from Nature Genetics, vol 40, pg 96-101, 2008) [J].
Bovee, Donald ;
Zhou, Yang ;
Haugen, Eric ;
Wu, Zaining ;
Hayden, Hillary S. ;
Gillett, Will ;
Tuzun, Eray ;
Cooper, Gregory M. ;
Sampas, Nick ;
Phelps, Karen ;
Levy, Ruth ;
Morrison, V. Anne ;
Sprague, James ;
Jewett, Donald ;
Buckley, Danielle ;
Subramaniam, Sandhya ;
Chang, Jean ;
Smith, Douglas R. ;
Olson, Maynard V. ;
Eichler, Evan E. ;
Kaul, Rajinder .
NATURE GENETICS, 2009, :S31-S36
[7]  
Cann HM, 2002, SCIENCE, V296, P261
[8]   Characterization of single-nucleotide polymorphisms in coding regions of human genes [J].
Cargill, M ;
Altshuler, D ;
Ireland, J ;
Sklar, P ;
Ardlie, K ;
Patil, N ;
Lane, CR ;
Lim, EP ;
Kalyanaraman, N ;
Nemesh, J ;
Ziaugra, L ;
Friedland, L ;
Rolfe, A ;
Warrington, J ;
Lipshutz, R ;
Daley, GQ ;
Lander, ES .
NATURE GENETICS, 1999, 22 (03) :231-238
[9]   Opinion - The Human Genome Diversity Project: past, present and future [J].
Cavalli-Sforza, LL .
NATURE REVIEWS GENETICS, 2005, 6 (04) :333-340
[10]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945