Verbumculus and the discovery of unusual words

被引:14
作者
Apostolico, A [1 ]
Gong, FC
Lonardi, S
机构
[1] Univ Padua, Dipartimento Ingn Informaz, Padua, Italy
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Celera Genom, Rockville, MD 20850 USA
[4] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
verbumculus; unusual words; subword statistics; pattern discovery; regulatory elements; suffix trees;
D O I
10.1007/BF02944783
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Measures relating word frequencies and expectations have been constantly of interest in Bioinformatics studies. With sequence data becoming massively available, exhaustive enumeration of such measures have become conceivable, and yet pose significant computational burden even when limited to words of bounded maximum length. In addition, the display of the huge tables possibly resulting from these counts poses practical problems of visualization and inference. VERBUMCULUS is a suite of software tools for the efficient and fast detection of over- or under-represented words in nucleotide sequences. The inner core of VERBUMCULUS rests on subtly interwoven properties of statistics, pattern matching and combinatorics on words, that enable one to limit drastically and a priori the set of over- or under-represented candidate words of all lengths in a given sequence, thereby rendering it more feasible both to detect and visualize such words in a fast and practically useful way. This paper is devoted to the description of the facility at the outset and to report experimental results, ranging from simulations on synthetic data to the discovery of regulatory elements on the upstream regions of a set of genes of the yeast.
引用
收藏
页码:22 / 41
页数:20
相关论文
共 53 条
[1]   DROSOPHILA SCAFFOLD-ATTACHED REGIONS BIND NUCLEAR SCAFFOLDS AND CAN FUNCTION AS ARS ELEMENTS IN BOTH BUDDING AND FISSION YEASTS [J].
AMATI, B ;
GASSER, SM .
MOLECULAR AND CELLULAR BIOLOGY, 1990, 10 (10) :5442-5454
[2]  
Apostolico A, 2003, NATO SC S SS III C S, V183, P111
[3]   Sequence alignment in molecular biology [J].
Apostolico, A ;
Giancarlo, R .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :173-196
[4]   Monotony of surprise and large-scale quest for unusual words [J].
Apostolico, A ;
Bock, ME ;
Lonardi, S .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :283-311
[5]   Efficient detection of unusual words [J].
Apostolico, A ;
Bock, ME ;
Lonardi, S ;
Xu, XY .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) :71-94
[6]  
Apostolico A, 1997, Pattern Matching Algorithms
[7]  
APOSTOLICO A, 2002, P RES COMP MOL BIOL, P283
[8]  
Apostolico A., 1985, NATO ASI Series, V12, P85, DOI [DOI 10.1007/978-3-642-82456-2_6, 10.1007/978-3-642-82456-26, DOI 10.1007/978-3-642-82456-26]
[9]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[10]  
Boulikas T, 1995, INT REV CYTOL, V162A, P279