fjoin: Simple and efficient computation of feature overlaps

被引:24
作者
Richardson, Joel E. [1 ]
机构
[1] Jackson Lab, Mouse Genome Informat, Bar Harbor, ME 04609 USA
关键词
features; intervals; overlap computation; algorithm; implementation;
D O I
10.1089/cmb.2006.13.1457
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Sets of biological features with genome coordinates ( e. g., genes and promoters) are a particularly common form of data in bioinformatics today. Accordingly, an increasingly important processing step involves comparing coordinates from large sets of features to find overlapping feature pairs. This paper presents fjoin, an efficient, robust, and simple algorithm for finding these pairs, and a downloadable implementation. For typical bioinformatics feature sets, fjoin requires O(n log( n)) time (O( n) if the inputs are sorted) and uses O(1) space. The reference implementation is a stand-alone Python program; it implements the basic algorithm and a number of useful extensions, which are also discussed in this paper.
引用
收藏
页码:1457 / 1464
页数:8
相关论文
共 20 条
  • [1] BENTLEY J, 1977, UNPUB ALGORITHMS KLE
  • [2] *DBSNP, 2006, ONL DAT RES
  • [3] EDELSBRUNNER H, 1980, 47 TU GRAZ GRAZ
  • [4] The Mouse Genome Database (MGD): from genes to mice - a community resource for mouse biology
    Eppig, JT
    Bult, CJ
    Kadin, JA
    Richardson, JE
    Blake, JA
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D471 - D475
  • [5] A computer program for aligning a cDNA sequence with a genomic DNA sequence
    Florea, L
    Hartzell, G
    Zhang, Z
    Rubin, GM
    Miller, W
    [J]. GENOME RESEARCH, 1998, 8 (09) : 967 - 974
  • [6] *GFF3, 2004, GEN FEAT FORM VERS 3
  • [7] GUTTMAN A, 1984, P INT C MAN DAT ACM, P44
  • [8] The mouse Gene Expression Database (GXD): updates and enhancements
    Hill, DP
    Begley, DA
    Finger, JH
    Hayamizu, TF
    McCright, IJ
    Smith, CM
    Beal, JS
    Corbani, LE
    Blake, JA
    Eppig, JT
    Kadin, JA
    Richardson, JE
    Ringwald, M
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D568 - D571
  • [9] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
  • [10] KORTH HF, 1986, DATABASE SYSTEM CONC