A systematic investigation identifies a significant number of probable pseudogenes in the Escherichia coli genome

被引:27
作者
Homma, K
Fukuchi, S
Kawabata, T
Ota, M
Nishikawa, K
机构
[1] Natl Inst Genet, DNA Data Bank Japan, Ctr Informat Biol, Lab Gene Prod Informat, Mishima, Shizuoka 4118540, Japan
[2] Japan Sci & Technol Corp, Kawaguchi, Saitama 3320012, Japan
关键词
three-dimensional structure; structure prediction; gram-negative bacteria; position-specific iterated basic local alignment search tool; horizontal transfer;
D O I
10.1016/S0378-1119(02)00794-1
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Pseudogenes are open reading frames (ORFs) encoding dysfunctional proteins with high homology,to known protein-coding genes. Although pseudogenes were reported to. exist in the genomes of many eukaryotes and bacteria, no systematic search for pseudogenes in the Escherichia coli genome has been carried out. Genome comparisons of E. coli strains K-12 and O157 revealed that many protein-coding sequences have prematurely terminated orthologs encoding unstable proteins. To systematically screen for pseudogenes, we selected ORFs generated by premature termination of the orthologous protein-coding genes and subsequently excluded those possibly arising from sequence errors. Lastly we eliminated those with close homologs in this and other species, as these shortened ORFs may actually have functions. The process produced 95 and 101 pseudogene candidates in K-12 and O157, respectively. The assigned three-dimensional structures suggest that most of the encoded proteins cannot fold properly and thus are dysfunctional, indicating that they are probably pseudogenes. Therefore, the existence of a significant number of probable pseudogenes in E. coli is predicted, awaiting experimental verification. Most of them were found to be genes with paralogs or horizontally transferred genes or both. We suggest that pseudogenes constitute a small fraction of the genomes of free-living bacteria in general, reflecting the faster elimination than production of pseudogenes. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:25 / 33
页数:9
相关论文
共 32 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ANDERSON J, 2000, PSYCHO-ONCOL, V9, P1
[3]   The genome sequence of Rickettsia prowazekii and the origin of mitochondria [J].
Andersson, SGE ;
Zomorodipour, A ;
Andersson, JO ;
Sicheritz-Pontén, T ;
Alsmark, UCM ;
Podowski, RM ;
Näslund, AK ;
Eriksson, AS ;
Winkler, HH ;
Kurland, CG .
NATURE, 1998, 396 (6707) :133-140
[4]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[5]   Pathways for the utilization of N-acetyl-galactosamine and galactosamine in Escherichia coli [J].
Brinkkötter, A ;
Klöss, H ;
Alpert, CA ;
Lengeler, JW .
MOLECULAR MICROBIOLOGY, 2000, 37 (01) :125-135
[6]   A bacterial genome in flux:: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochete Borrelia burgdorferi [J].
Casjens, S ;
Palmer, N ;
van Vugt, R ;
Huang, WM ;
Stevenson, B ;
Rosa, P ;
Lathigra, R ;
Sutton, G ;
Peterson, J ;
Dodson, RJ ;
Haft, D ;
Hickey, E ;
Gwinn, M ;
White, O ;
Fraser, CM .
MOLECULAR MICROBIOLOGY, 2000, 35 (03) :490-516
[7]   Massive gene decay in the leprosy bacillus [J].
Cole, ST ;
Eiglmeier, K ;
Parkhill, J ;
James, KD ;
Thomson, NR ;
Wheeler, PR ;
Honoré, N ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Mungall, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, RM ;
Devlin, K ;
Duthoy, S ;
Feltwell, T ;
Fraser, A ;
Hamlin, N ;
Holroyd, S ;
Hornsby, T ;
Jagels, K ;
Lacroix, C ;
Maclean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Quail, MA ;
Rajandream, MA ;
Rutherford, KM ;
Rutter, S ;
Seeger, K ;
Simon, S ;
Simmonds, M ;
Skelton, J ;
Squares, R ;
Squares, S ;
Stevens, K ;
Taylor, K ;
Whitehead, S ;
Woodward, JR ;
Barrell, BG .
NATURE, 2001, 409 (6823) :1007-1011
[8]  
ENGELBERGKULKA H, 1996, CELLULAR MOL BIOL, V1, P909
[9]   Nature and structure of human genes that generate retropseudogenes [J].
Gonçalves, I ;
Duret, L ;
Mouchiroud, D .
GENOME RESEARCH, 2000, 10 (05) :672-678
[10]   Digging for dead genes:: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome [J].
Harrison, PM ;
Echols, N ;
Gerstein, MB .
NUCLEIC ACIDS RESEARCH, 2001, 29 (03) :818-830