Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

被引:78
作者
Swaminathan, Kankshita [1 ]
Varala, Kranthi [1 ]
Hudson, Matthew E. [1 ]
机构
[1] Univ Illinois, Dept Crop Sci, Urbana, IL 61801 USA
关键词
D O I
10.1186/1471-2164-8-132
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 0836 [生物工程]; 090102 [作物遗传育种]; 100705 [微生物与生化药学];
摘要
Background: Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results: We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis). Conclusion: This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.
引用
收藏
页数:13
相关论文
共 35 条
[1]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]
Arumuganathan K, 1991, PLANT MOL BIOL REP, V9, P208, DOI [10.1007/BF02672069, DOI 10.1007/BF02672069]
[3]
Analysis of the 1.1-Mb human alpha/beta T-cell receptor locus with bacterial artificial chromosome clones [J].
Boysen, C ;
Simon, MI ;
Hood, L .
GENOME RESEARCH, 1997, 7 (04) :330-338
[4]
Representation of cloned genomic sequences in two sequencing vectors: Correlation of DNA sequence and subclone distribution [J].
Chissoe, SL ;
Marra, MA ;
Hillier, L ;
Brinkman, R ;
Wilson, RK ;
Waterston, RH .
NUCLEIC ACIDS RESEARCH, 1997, 25 (15) :2960-2966
[5]
A SURVEY OF THE GENOMIC DISTRIBUTION OF ALPHA-SATELLITE DNA ON ALL THE HUMAN-CHROMOSOMES, AND DERIVATION OF A NEW CONSENSUS SEQUENCE [J].
CHOO, KH ;
VISSEL, B ;
NAGY, A ;
EARLE, E ;
KALITSIS, P .
NUCLEIC ACIDS RESEARCH, 1991, 19 (06) :1179-1182
[6]
Clough SJ, 2004, GENOME, V47, P819, DOI [10.1139/g04-049, 10.1139/G04-049]
[7]
Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[8]
WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[9]
Gene amplification of the Hps locus in Glycine max [J].
Gijzen, Mark ;
Kuflu, Kuflom ;
Moy, Pat .
BMC PLANT BIOLOGY, 2006, 6 (1) :1-10
[10]
DNA-SEQUENCE ORGANIZATION IN SOYBEAN PLANT [J].
GOLDBERG, RB .
BIOCHEMICAL GENETICS, 1978, 16 (1-2) :45-68