Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences

被引:33
作者
Yada, T
Totoki, Y
Ishikawa, M
Asai, K
Nakai, K
机构
[1] Japan Sci & Technol Corp, Chiyoda Ku, Tokyo 102, Japan
[2] Informat & Math Sci Lab Inc, Toshima Ku, Tokyo 171, Japan
[3] Meiji Univ, Chiyoda Ku, Tokyo 101, Japan
[4] Electrotech Lab, Tsukuba, Ibaraki 305, Japan
[5] Osaka Univ, Inst Mol & Cellular Biol, Suita, Osaka 565, Japan
关键词
D O I
10.1093/bioinformatics/14.4.317
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Automatic extraction of motifs that occur frequently on a set of unaligned DNA sequences is useful for predicting the binding sites of unknown transcription factors. Several programs for this purpose have been released. However, in our-opinion, they are not practical enough to bk applied to a large number of upstream sequences. Results: We propose a new program called YEBIS (Yet another Environment for the analysis of BIopolymer Sequences) which is capable of extracting a set of motifs, without any apriori knowledge, from a number of functionally related DNA sequences. Using the hidden Markov model, these motifs are represented in a more generalform than other conventional methods, such as the weight matrix method. When applied to several sets of benchmark data, it was found that YEBIS had comparable capability to the existing methods, but was much faster. Moreover, it could extract all known motifs from the LTR sequences (long terminal repeat sequences) in a single run. Finally, it could be successfully applied to similar to 400 human promoter sequences and some of the extracted motifs turned out to be known cis-elements. Therefore, YEBIS could be a practical tool for exploring the upstream sequences of genomic ORFs, some of which are regulated in a similar fashion.
引用
收藏
页码:317 / 325
页数:9
相关论文
共 18 条
[1]  
Bailey TL, 1994, P 2 INT C INT SYST M, V2, P28
[2]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[3]   GenBank [J].
Benson, DA ;
Boguski, M ;
Lipman, DJ ;
Ostell, J .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :1-5
[4]  
Brazma A, 1996, Proc Int Conf Intell Syst Mol Biol, V4, P34
[5]   WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES [J].
BUCHER, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) :563-578
[6]   Eukaryotic promoter recognition [J].
Fickett, JW ;
Hatzigeorgiou, AC .
GENOME RESEARCH, 1997, 7 (09) :861-878
[7]  
Frech K, 1997, COMPUT APPL BIOSCI, V13, P89
[8]  
HERTZ G, 1995, P 3 INT C BIOINF GEN, P201
[9]  
HIROSAWA M, 1995, COMPUT APPL BIOSCI, V11, P13
[10]   HIDDEN MARKOV-MODELS IN COMPUTATIONAL BIOLOGY - APPLICATIONS TO PROTEIN MODELING [J].
KROGH, A ;
BROWN, M ;
MIAN, IS ;
SJOLANDER, K ;
HAUSSLER, D .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (05) :1501-1531