HOPPSIGEN: a database of human and mouse processed pseudogenes

被引:44
作者
Adel, K [1 ]
Laurent, D [1 ]
Dominique, M [1 ]
机构
[1] Univ Lyon 1, Lab Biometrie & Biol Evolut, CNRS, UMR 5558, F-69622 Villeurbanne, France
关键词
D O I
10.1093/nar/gki084
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Processed pseudogenes result from reverse transcribed mRNAs. In general, because processed pseudogenes lack promoters, they are no longer functional from the moment they are inserted into the genome. Subsequently, they freely accumulate substitutions, insertions and deletions. Moreover, the ancestral structure of processed pseudogenes could be easily inferred using the sequence of their functional homologous genes. Owing to these characteristics, processed pseudogenes represent good neutral markers for studying genome evolution. Recently, there is an increasing interest for these markers, particularly to help gene prediction in the field of genome annotation, functional genomics and genome evolution analysis (patterns of substitution). For these reasons, we have developed a method to annotate processed pseudogenes in complete genomes. To make them useful to different fields of research, we stored them in a nucleic acid database after having annotated them. In this work, we screened both mouse and human complete genomes from ENSEMBL to find processed pseudogenes generated from functional genes with introns. We used a conservative method to detect processed pseudogenes in order to minimize the rate of false positive sequences. Within processed pseudogenes, some are still having a conserved open reading frame and some have overlapping gene locations. We designated as retroelements all reverse transcribed sequences and more strictly, we designated as processed pseudogenes, all retroelements not failing in the two former categories (having a conserved open reading or overlapping gene locations). We annotated 5823 retroelements (5206 processed pseudogenes) in the human genome and 3934 (3428 processed pseudogenes) in the mouse genome. Compared to previous estimations, the total number of processed pseudogenes was underestimated but the aim of this procedure was to generate a high-quality dataset. To facilitate the use of processed pseudogenes in studying genome structure and evolution, DNA sequences from processed pseudogenes, and their functional reverse transcribed homologs, are now stored in a nucleic acid database, HOPPSIGEN. HOPPSIGEN can be browsed on the PBIL (Pole Bioinformatique Lyonnais) World Wide Web server (http://pbil.univ-lyon1.fr/) or fully downloaded for local installation.
引用
收藏
页码:D59 / D66
页数:8
相关论文
共 31 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Ensembl 2004
    Birney, E
    Andrews, D
    Bevan, P
    Caccamo, M
    Cameron, G
    Chen, Y
    Clarke, L
    Coates, G
    Cox, T
    Cuff, J
    Curwen, V
    Cutts, T
    Down, T
    Durbin, R
    Eyras, E
    Fernandez-Suarez, XM
    Gane, P
    Gibbins, B
    Gilbert, J
    Hammond, M
    Hotz, H
    Iyer, V
    Kahari, A
    Jekosch, K
    Kasprzyk, A
    Keefe, D
    Keenan, S
    Lehvaslaiho, H
    McVicker, G
    Melsopp, C
    Meidl, P
    Mongin, E
    Pettett, R
    Potter, S
    Proctor, G
    Rae, M
    Searle, S
    Slater, G
    Smedley, D
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Storey, R
    Ureta-Vidal, A
    Woodwark, C
    Clamp, M
    Hubbard, T
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D468 - D470
  • [3] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [4] HOVERGEN - A DATABASE OF HOMOLOGOUS VERTEBRATE GENES
    DURET, L
    MOUCHIROUD, D
    GOUY, M
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (12) : 2360 - 2365
  • [5] Human LINE retrotransposons generate processed pseudogenes
    Esnault, C
    Maestre, J
    Heidmann, T
    [J]. NATURE GENETICS, 2000, 24 (04) : 363 - 367
  • [6] Analysis of expressed sequence tags indicates 35,000 human genes
    Ewing, B
    Green, P
    [J]. NATURE GENETICS, 2000, 25 (02) : 232 - 234
  • [7] Isochores result from mutation not selection
    Francino, HP
    Ochman, H
    [J]. NATURE, 1999, 400 (6739) : 30 - 31
  • [8] Nature and structure of human genes that generate retropseudogenes
    Gonçalves, I
    Duret, L
    Mouchiroud, D
    [J]. GENOME RESEARCH, 2000, 10 (05) : 672 - 678
  • [9] GOUY M, 1985, COMPUT APPL BIOSCI, V1, P167
  • [10] Molecular fossils in the human genome: Identification and analysis of the pseudogenes in chromosomes 21 and 22
    Harrison, PM
    Hegyi, H
    Balasubramanian, S
    Luscombe, NM
    Bertone, P
    Echols, N
    Johnson, T
    Gerstein, M
    [J]. GENOME RESEARCH, 2002, 12 (02) : 272 - 280