Identification of genomic features using microsyntenies of domains: Domain teams

被引:27
作者
Pasek, S [1 ]
Bergeron, A
Risler, JL
Louis, A
Ollivier, E
Raffinot, M
机构
[1] CNRS, UEVE, Lab Genome & Informat, F-91034 Evry, France
[2] Infobiogen, Evry, France
[3] Univ Quebec, LaclM, Montreal, PQ H3C 3P8, Canada
[4] Soulsci, Biopole Clermont Limagne, F-63360 St Beauzire, France
关键词
D O I
10.1101/gr.3638405
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The detection, across several genomes, of local conservation of gene content and proximity considerably helps the prediction of features of interest, such as gene fusions or physical and functional interactions. Here, we want to process realistic models of chromosomes, in which genes (or genomic segments of several genes) call be duplicated within a chromosome, or be absent from some other chromosome(s). Our approach adopts the technique of temporarily forgetting genes and working directly with protein "domains" such as those found in Pfam. This allows the detection of strings of domains that are conserved in their content, but not necessarily in their order, which we refer to as domain teams. The prominent feature of the method is that it relaxes the rigidity of the orthology criterion and avoids many of the pitfalls of gene-families identification methods, often hampered by multidomain proteins or low levels of sequence similarity. This approach, that allows both inter- and intrachromosomal comparisons, proves to be more sensitive than the classical methods based on pairwise sequence comparisons, particularly in the simultaneous treatment of many species. The automated and fast detection of domain teams, together with its increased sensitivity at identifying segments of identical (protein-coding) gene contents as well as gene fusions, Should prove a useful complement to other existing methods.
引用
收藏
页码:867 / 874
页数:8
相关论文
共 43 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[3]  
[Anonymous], GENOME BIOL
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[5]  
Bergeron A, 2002, LECT NOTES COMPUT SC, V2452, P464
[6]   Fast identification and statistical evaluation of segmental homologies in comparative maps [J].
Calabrese, Peter P. ;
Chakravarty, Sugata ;
Vision, Todd J. .
BIOINFORMATICS, 2003, 19 :i74-i80
[7]   Tests for gene clustering [J].
Durand, D ;
Sankoff, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :453-482
[8]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[9]   Protein interaction maps for complete genomes based on gene fusion events [J].
Enright, AJ ;
Iliopoulos, I ;
Kyrpides, NC ;
Ouzounis, CA .
NATURE, 1999, 402 (6757) :86-90
[10]  
Enright AJ, 2001, GENOME BIOL, V2