Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing

被引:51
作者
Doi, Koichiro [1 ]
Monjo, Taku [1 ,2 ]
Hoang, Pham H. [1 ,2 ]
Yoshimura, Jun [1 ]
Yurino, Hideaki [1 ]
Mitsui, Jun [3 ]
Ishiura, Hiroyuki [3 ]
Takahashi, Yuji [3 ]
Ichikawa, Yaeko [3 ]
Goto, Jun [3 ]
Tsuji, Shoji [3 ]
Morishita, Shinichi [1 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Chiba 2778562, Japan
[2] Univ Tokyo, Dept Informat & Commun Engn, Fac Engn, Tokyo 1138655, Japan
[3] Univ Tokyo, Grad Sch Med, Dept Neurol, Tokyo 1138655, Japan
关键词
FRAGILE-X; HEXANUCLEOTIDE REPEAT; EXPANSION; REGION; IDENTIFICATION; MUTATIONS; EFFICIENT; C9ORF72; FTD;
D O I
10.1093/bioinformatics/btt647
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Long expansions of short tandem repeats (STRs), i.e. DNA repeats of 2-6 nt, are associated with some genetic diseases. Cost-efficient high-throughput sequencing can quickly produce billions of short reads that would be useful for uncovering disease-associated STRs. However, enumerating STRs in short reads remains largely unexplored because of the difficulty in elucidating STRs much longer than 100bp, the typical length of short reads. Results: We propose ab initio procedures for sensing and locating long STRs promptly by using the frequency distribution of all STRs and paired-end read information. We validated the reproducibility of this method using biological replicates and used it to locate an STR associated with a brain disease (SCA31). Subsequently, we sequenced this STR site in 11 SCA31 samples using SMRT TM sequencing (Pacific Biosciences), determined 2.3-3.1 kb sequences at nucleotide resolution and revealed that (TGGAA)- and (TAAAATAGAA)-repeat expansions determined the instability of the repeat expansions associated with SCA31. Our method could also identify common STRs, (AAAG)- and (AAAAG)-repeat expansions, which are remarkably expanded at four positions in an SCA31 sample. This is the first proposed method for rapidly finding disease-associated long STRs in personal genomes using hybrid sequencing of short and long reads.
引用
收藏
页码:815 / 822
页数:8
相关论文
共 37 条
[1]   Mutability of Y-Chromosomal Microsatellites: Rates, Characteristics, Molecular Bases, and Forensic Implications [J].
Ballantyne, Kaye N. ;
Goedbloed, Miriam ;
Fang, Rixun ;
Schaap, Onno ;
Lao, Oscar ;
Wollstein, Andreas ;
Choi, Ying ;
van Duijn, Kate ;
Vermeulen, Mark ;
Brauer, Silke ;
Decorte, Ronny ;
Poetsch, Micaela ;
von Wurmb-Schwark, Nicole ;
de Knijff, Peter ;
Labuda, Damian ;
Vezina, Helene ;
Knoblauch, Hans ;
Lessig, Ruediger ;
Roewer, Lutz ;
Ploski, Rafal ;
Dobosz, Tadeusz ;
Henke, Lotte ;
Henke, Juegen ;
Furtado, Manohar R. ;
Kayser, Manfred .
AMERICAN JOURNAL OF HUMAN GENETICS, 2010, 87 (03) :341-353
[2]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[3]   Variation in genome-wide mutation rates within and between human families [J].
Conrad, Donald F. ;
Keebler, Jonathan E. M. ;
DePristo, Mark A. ;
Lindsay, Sarah J. ;
Zhang, Yujun ;
Casals, Ferran ;
Idaghdour, Youssef ;
Hartl, Chris L. ;
Torroja, Carlos ;
Garimella, Kiran V. ;
Zilversmit, Martine ;
Cartwright, Reed ;
Rouleau, Guy A. ;
Daly, Mark ;
Stone, Eric A. ;
Hurles, Matthew E. ;
Awadalla, Philip .
NATURE GENETICS, 2011, 43 (07) :712-U137
[4]   Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9ORF72 Causes Chromosome 9p-Linked FTD and ALS [J].
DeJesus-Hernandez, Mariely ;
Mackenzie, Ian R. ;
Boeve, Bradley F. ;
Boxer, Adam L. ;
Baker, Matt ;
Rutherford, Nicola J. ;
Nicholson, Alexandra M. ;
Finch, NiCole A. ;
Flynn, Heather ;
Adamson, Jennifer ;
Kouri, Naomi ;
Wojtas, Aleksandra ;
Sengdy, Pheth ;
Hsiung, Ging-Yuek R. ;
Karydas, Anna ;
Seeley, William W. ;
Josephs, Keith A. ;
Coppola, Giovanni ;
Geschwind, Daniel H. ;
Wszolek, Zbigniew K. ;
Feldman, Howard ;
Knopman, David S. ;
Petersen, Ronald C. ;
Miller, Bruce L. ;
Dickson, Dennis W. ;
Boylan, Kevin B. ;
Graff-Radford, Neill R. ;
Rademakers, Rosa .
NEURON, 2011, 72 (02) :245-256
[5]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[6]   A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric [J].
Domanic, Nevzat Onur ;
Preparata, Franco P. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (07) :873-891
[7]   Real-Time DNA Sequencing from Single Polymerase Molecules [J].
Eid, John ;
Fehr, Adrian ;
Gray, Jeremy ;
Luong, Khai ;
Lyle, John ;
Otto, Geoff ;
Peluso, Paul ;
Rank, David ;
Baybayan, Primo ;
Bettman, Brad ;
Bibillo, Arkadiusz ;
Bjornson, Keith ;
Chaudhuri, Bidhan ;
Christians, Frederick ;
Cicero, Ronald ;
Clark, Sonya ;
Dalal, Ravindra ;
deWinter, Alex ;
Dixon, John ;
Foquet, Mathieu ;
Gaertner, Alfred ;
Hardenbol, Paul ;
Heiner, Cheryl ;
Hester, Kevin ;
Holden, David ;
Kearns, Gregory ;
Kong, Xiangxu ;
Kuse, Ronald ;
Lacroix, Yves ;
Lin, Steven ;
Lundquist, Paul ;
Ma, Congcong ;
Marks, Patrick ;
Maxham, Mark ;
Murphy, Devon ;
Park, Insil ;
Pham, Thang ;
Phillips, Michael ;
Roy, Joy ;
Sebra, Robert ;
Shen, Gene ;
Sorenson, Jon ;
Tomaney, Austin ;
Travers, Kevin ;
Trulson, Mark ;
Vieceli, John ;
Wegener, Jeffrey ;
Wu, Dawn ;
Yang, Alicia ;
Zaccarin, Denis .
SCIENCE, 2009, 323 (5910) :133-138
[8]   HIGHLY CONSERVED REPETITIVE DNA-SEQUENCES ARE PRESENT AT HUMAN CENTROMERES [J].
GRADY, DL ;
RATLIFF, RL ;
ROBINSON, DL ;
MCCANLIES, EC ;
MEYNE, J ;
MOYZIS, RK .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (05) :1695-1699
[9]   lobSTR: A short tandem repeat profiler for personal genomes [J].
Gymrek, Melissa ;
Golan, David ;
Rosset, Saharon ;
Erlich, Yaniv .
GENOME RESEARCH, 2012, 22 (06) :1154-1162
[10]   T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm [J].
Jorda, Julien ;
Kajava, Andrey V. .
BIOINFORMATICS, 2009, 25 (20) :2632-2638