A strategy for assembling the maize (Zea mays L.) genome

被引:30
作者
Emrich, SJ
Aluru, S [1 ]
Fu, Y
Wen, TJ
Narayanan, M
Guo, L
Ashlock, DA
Schnable, PS
机构
[1] Iowa State Univ, Bioinformat & Computat Biol Grad Program, Ames, IA 50011 USA
[2] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
[3] Iowa State Univ, Interdept Genet Grad Program, Ames, IA 50011 USA
[4] Iowa State Univ, Dept Agron, Ames, IA 50011 USA
[5] Iowa State Univ, Dept Math, Ames, IA 50011 USA
[6] Iowa State Univ, Dept Genet Dev & Cell Biol, Ames, IA 50011 USA
[7] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[8] Iowa State Univ, Ctr Plant Genom, Ames, IA 50011 USA
[9] Iowa State Univ, LH Baker Ctr Bioinformat & Biol Stat, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth017
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Summary: Because the bulk of the maize (Zea mays L.) genome consists of repetitive sequences, sequencing efforts are being targeted to its 'gene-rich' fraction. Traditional assembly programs are inadequate for this approach because they are optimized for a uniform sampling of the genome and inherently lack the ability to differentiate highly similar paralogs. Results: We report the development of bioinformatics tools for the accurate assembly of the maize genome. This software, which is based on innovative parallel algorithms to ensure scalability, assembled 730 974 genomic survey sequences fragments in 4 h using 64 Pentium III 1.26 GHz processors of a commodity cluster. Algorithmic innovations are used to reduce the number of pairwise alignments significantly without sacrificing quality. Clone pair information was used to estimate the error rate for improved differentiation of polymorphisms versus sequencing errors. The assembly was also used to evaluate the effectiveness of various filtering strategies and thereby provide information that can be used to focus subsequent sequencing efforts.
引用
收藏
页码:140 / 147
页数:8
相关论文
共 24 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] Arumuganathan K, 1991, PLANT MOL BIOL REP, V9, P208, DOI [DOI 10.1007/BF02672069, 10.1007/BF02672069]
  • [3] Recent segmental duplications in the human genome
    Bailey, JA
    Gu, ZP
    Clark, RA
    Reinert, K
    Samonte, RV
    Schwartz, S
    Adams, MD
    Myers, EW
    Li, PW
    Eichler, EE
    [J]. SCIENCE, 2002, 297 (5583) : 1003 - 1007
  • [4] Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
  • [5] MaskerAid:: a performance enhancement to RepeatMasker
    Bedell, JA
    Korf, I
    Gish, W
    [J]. BIOINFORMATICS, 2000, 16 (11) : 1040 - 1041
  • [6] Variations on probabilistic suffix trees: statistical modeling and prediction of protein families
    Bejerano, G
    Yona, G
    [J]. BIOINFORMATICS, 2001, 17 (01) : 23 - 43
  • [7] Bennetzen Jeffrey L., 2001, Plant Physiology (Rockville), V127, P1572, DOI 10.1104/pp.010817
  • [8] The contributions of retroelements to plant genome organization, function and evolution
    Bennetzen, JL
    [J]. TRENDS IN MICROBIOLOGY, 1996, 4 (09) : 347 - 353
  • [9] Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
    Cheung, J
    Estivill, X
    Khaja, R
    MacDonald, JR
    Lau, K
    Tsui, LC
    Scherer, SW
    [J]. GENOME BIOLOGY, 2003, 4 (04)
  • [10] Improved microbial gene identification with GLIMMER
    Delcher, AL
    Harmon, D
    Kasif, S
    White, O
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (23) : 4636 - 4641