Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

被引:61
作者
Wicker, Thomas [2 ]
Narechania, Apurva [3 ]
Sabot, Francois [4 ]
Stein, Joshua [3 ]
Vu, Giang Th [1 ,6 ]
Graner, Andreas [1 ]
Ware, Doreen [3 ,5 ]
Stein, Nils [1 ]
机构
[1] Leibniz Inst Plant Genet & Crop Plant Res IPK, D-06466 Gatersleben, Germany
[2] Univ Zurich, Inst Plant Biol, CH-8008 Zurich, Switzerland
[3] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[4] Univ Perpignan, CNRS, UMR 5096, IRD,Lab Genome & Dev Plantes, F-66860 Perpignan, France
[5] USDA ARS, NAA, Plant Soil & Nutr Lab, Res Unit, Ithaca, NY 14853 USA
[6] Aberystwyth Univ, IBERS, Ceredigion SY23 3DA, Wales
基金
美国国家科学基金会;
关键词
D O I
10.1186/1471-2164-9-518
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR) index can be generated to map repetitive regions in genomic sequences. Results: We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC) clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion: An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i. e. potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be used across species is outweighed by the low costs of Illumina/Solexa sequencing which makes any chosen genome accessible for whole-genome sequence sampling.
引用
收藏
页数:15
相关论文
共 54 条
[1]  
Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[4]   NUCLEAR-DNA AMOUNTS IN ANGIOSPERMS [J].
BENNETT, MD ;
SMITH, JB .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1976, 274 (933) :227-274
[5]   Comparison of orthologous loci from small grass genomes Brachypodium and rice:: implications for wheat genomics and grass genome annotation [J].
Bossolini, Eligio ;
Wicker, Thomas ;
Knobel, Philip A. ;
Keller, Beat .
PLANT JOURNAL, 2007, 49 (04) :704-717
[6]  
Brunner S, 2003, GENETICS, V164, P673
[7]   Comparative sequence analysis of the region harboring the hardness locus in barley and its colinear region in rice [J].
Caldwell, KS ;
Langridge, P ;
Powell, W .
PLANT PHYSIOLOGY, 2004, 136 (02) :3177-3190
[8]   Sequencing of the Triticum monococcum Hardness locus reveals good microcolinearity with rice [J].
Chantret, N ;
Cenci, A ;
Sabot, F ;
Anderson, O ;
Dubcovsky, J .
MOLECULAR GENETICS AND GENOMICS, 2004, 271 (04) :377-386
[9]   Active conservation of noncoding sequences revealed by three-way species comparisons [J].
Dubchak, I ;
Brudno, M ;
Loots, GG ;
Pachter, L ;
Mayor, C ;
Rubin, EM ;
Frazer, KA .
GENOME RESEARCH, 2000, 10 (09) :1304-1306
[10]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185