Making sense of EST sequences by CLOBBing them

被引:84
作者
Parkinson, J [1 ]
Guiliano, DB [1 ]
Blaxter, M [1 ]
机构
[1] Univ Edinburgh, Inst Cell Anim & Populat Biol, Edinburgh EH9 3JT, Midlothian, Scotland
关键词
D O I
10.1186/1471-2105-3-31
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. Results: As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to >95% of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices. Conclusions: CLOBB provides a highly portable EST clustering solution and is freely downloaded from: [http://www.nematodes.org/CLOBB].
引用
收藏
页数:8
相关论文
共 18 条
  • [1] ADAMS MD, 1995, NATURE, V377, P3
  • [2] COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT
    ADAMS, MD
    KELLEY, JM
    GOCAYNE, JD
    DUBNICK, M
    POLYMEROPOULOS, MH
    XIAO, H
    MERRIL, CR
    WU, A
    OLDE, B
    MORENO, RF
    KERLAVAGE, AR
    MCCOMBIE, WR
    VENTER, JC
    [J]. SCIENCE, 1991, 252 (5013) : 1651 - 1656
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [5] [Anonymous], INTELL SYST MOL BIOL
  • [6] ESTABLISHING A HUMAN TRANSCRIPT MAP
    BOGUSKI, MS
    SCHULER, GD
    [J]. NATURE GENETICS, 1995, 10 (04) : 369 - 371
  • [7] DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS
    BOGUSKI, MS
    LOWE, TMJ
    TOLSTOSHEV, CM
    [J]. NATURE GENETICS, 1993, 4 (04) : 332 - 333
  • [8] d2_cluster: A validated method for clustering EST and full-length cDNA sequences
    Burke, J
    Davison, D
    Hide, W
    [J]. GENOME RESEARCH, 1999, 9 (11) : 1135 - 1142
  • [9] Gill RW, 1997, COMPUT APPL BIOSCI, V13, P453
  • [10] Evolutionary relationships among proteins probed by an iterative neighborhood cluster analysis (INCA).: Alignment of bacteriorhodopsins with the yeast sequence YRO2
    Graul, RC
    Sadée, W
    [J]. PHARMACEUTICAL RESEARCH, 1997, 14 (11) : 1533 - 1541