Limitations of next-generation genome sequence assembly

被引:498
作者
Alkan, Can
Sajjadian, Saba
Eichler, Evan E. [1 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
SEGMENTAL DUPLICATIONS; COPY-NUMBER; ELEMENTS;
D O I
10.1038/NMETH.1527
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.
引用
收藏
页码:61 / 65
页数:5
相关论文
共 27 条
  • [1] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [2] Segmental duplications: Organization and impact within the current Human Genome Project assembly
    Bailey, JA
    Yavor, AM
    Massa, HF
    Trask, BJ
    Eichler, EE
    [J]. GENOME RESEARCH, 2001, 11 (06) : 1005 - 1017
  • [3] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [4] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [5] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945
  • [6] A 360-kb interchromosomal duplication of the human HYDIN locus
    Doggett, Norman A.
    Xie, Gary
    Meincke, Linda J.
    Sutherland, Robert D.
    Mundt, Mark O.
    Berbari, Nicolas S.
    Davy, Brian E.
    Robinson, Michael L.
    Rudd, M. Katharine
    Weber, James L.
    Stallings, Raymond L.
    Han, Cliff
    [J]. GENOMICS, 2006, 88 (06) : 762 - 771
  • [7] Whole-genome disassembly
    Green, P
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (07) : 4143 - 4144
  • [8] Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species
    Haussler, David
    O'Brien, Stephen J.
    Ryder, Oliver A.
    Barker, F. Keith
    Clamp, Michele
    Crawford, Andrew J.
    Hanner, Robert
    Hanotte, Olivier
    Johnson, Warren E.
    McGuire, Jimmy A.
    Miller, Webb
    Murphy, Robert W.
    Murphy, William J.
    Sheldon, Frederick H.
    Sinervo, Barry
    Venkatesh, Byrappa
    Wiley, Edward O.
    Allendorf, Fred W.
    Amato, George
    Baker, C. Scott
    Bauer, Aaron
    Beja-Pereira, Albano
    Bermingham, Eldredge
    Bernardi, Giacomo
    Bonvicino, Cibele R.
    Brenner, Sydney
    Burke, Terry
    Cracraft, Joel
    Diekhans, Mark
    Edwards, Scott
    Ericson, Per G. P.
    Estes, James
    Fjelsda, Jon
    Flesness, Nate
    Gamble, Tony
    Gaubert, Philippe
    Graphodatsky, Alexander S.
    Graves, Jennifer A. Marshall
    Green, Eric D.
    Green, Richard E.
    Hackett, Shannon
    Hebert, Paul
    Helgen, Kristofer M.
    Joseph, Leo
    Kessing, Bailey
    Kingsley, David M.
    Lewin, Harris A.
    Luikart, Gordon
    Martelli, Paolo
    Moreira, Miguel A. M.
    [J]. JOURNAL OF HEREDITY, 2009, 100 (06) : 659 - 674
  • [9] The genome of the cucumber, Cucumis sativus L.
    Huang, Sanwen
    Li, Ruiqiang
    Zhang, Zhonghua
    Li, Li
    Gu, Xingfang
    Fan, Wei
    Lucas, William J.
    Wang, Xiaowu
    Xie, Bingyan
    Ni, Peixiang
    Ren, Yuanyuan
    Zhu, Hongmei
    Li, Jun
    Lin, Kui
    Jin, Weiwei
    Fei, Zhangjun
    Li, Guangcun
    Staub, Jack
    Kilian, Andrzej
    van der Vossen, Edwin A. G.
    Wu, Yang
    Guo, Jie
    He, Jun
    Jia, Zhiqi
    Ren, Yi
    Tian, Geng
    Lu, Yao
    Ruan, Jue
    Qian, Wubin
    Wang, Mingwei
    Huang, Quanfei
    Li, Bo
    Xuan, Zhaoling
    Cao, Jianjun
    Asan
    Wu, Zhigang
    Zhang, Juanbin
    Cai, Qingle
    Bai, Yinqi
    Zhao, Bowen
    Han, Yonghua
    Li, Ying
    Li, Xuefeng
    Wang, Shenhao
    Shi, Qiuxiang
    Liu, Shiqiang
    Cho, Won Kyong
    Kim, Jae-Yean
    Xu, Yong
    Heller-Uszynska, Katarzyna
    [J]. NATURE GENETICS, 2009, 41 (12) : 1275 - U29
  • [10] Repbase update, a database of eukaryotic repetitive elements
    Jurka, J
    Kapitonov, VV
    Pavlicek, A
    Klonowski, P
    Kohany, O
    Walichiewicz, J
    [J]. CYTOGENETIC AND GENOME RESEARCH, 2005, 110 (1-4) : 462 - 467