Pseudofam: the pseudogene families database

被引:25
作者
Lam, Hugo Y. K. [1 ]
Khurana, Ekta [2 ]
Fang, Gang [2 ]
Cayting, Philip [2 ]
Carriero, Nicholas [3 ]
Cheung, Kei-Hoi [3 ,4 ,5 ]
Gerstein, Mark B. [1 ,2 ,3 ]
机构
[1] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[2] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[3] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[4] Yale Univ, Ctr Med Informat, New Haven, CT 06520 USA
[5] Yale Univ, Dept Genet, New Haven, CT 06520 USA
基金
美国国家卫生研究院;
关键词
PROTEIN FAMILIES; IDENTIFICATION; GENES; REACTIVATION; EXPRESSION; EVOLUTION; PFAM;
D O I
10.1093/nar/gkn758
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125 000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.
引用
收藏
页码:D738 / D743
页数:6
相关论文
共 26 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] Primate segmental duplications: crucibles of evolution, diversity and disease
    Bailey, Jeffrey A.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (07) : 552 - 564
  • [4] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkh121, 10.1093/nar/gkr1065, 10.1093/nar/gkp985]
  • [5] Reactivation by exon shuffling of a conserved HLA-DR3-like pseudogene segment in a New World primate species
    Doxiadis, GGM
    van der Wiel, MKH
    Brok, HPM
    Groot, NG
    Otting, N
    't Hart, BA
    van Rood, JJ
    Bontrop, RE
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) : 5864 - 5868
  • [6] BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis
    Durinck, S
    Moreau, Y
    Kasprzyk, A
    Davis, S
    De Moor, B
    Brazma, A
    Huber, W
    [J]. BIOINFORMATICS, 2005, 21 (16) : 3439 - 3440
  • [7] Human housekeeping genes are compact
    Eisenberg, E
    Levanon, EY
    [J]. TRENDS IN GENETICS, 2003, 19 (07) : 362 - 365
  • [8] Pfam:: clans, web tools and services
    Finn, Robert D.
    Mistry, Jaina
    Schuster-Bockler, Benjamin
    Griffiths-Jones, Sam
    Hollich, Volker
    Lassmann, Timo
    Moxon, Simon
    Marshall, Mhairi
    Khanna, Ajay
    Durbin, Richard
    Eddy, Sean R.
    Sonnhammer, Erik L. L.
    Bateman, Alex
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D247 - D251
  • [9] Ensembl 2008
    Flicek, P.
    Aken, B. L.
    Beal, K.
    Ballester, B.
    Caccamo, M.
    Chen, Y.
    Clarke, L.
    Coates, G.
    Cunningham, F.
    Cutts, T.
    Down, T.
    Dyer, S. C.
    Eyre, T.
    Fitzgerald, S.
    Fernandez-Banet, J.
    Graf, S.
    Haider, S.
    Hammond, M.
    Holland, R.
    Howe, K. L.
    Howe, K.
    Johnson, N.
    Jenkinson, A.
    Kahari, A.
    Keefe, D.
    Kokocinski, F.
    Kulesha, E.
    Lawson, D.
    Longden, I.
    Megy, K.
    Meidl, P.
    Overduin, B.
    Parker, A.
    Pritchard, B.
    Prlic, A.
    Rice, S.
    Rios, D.
    Schuster, M.
    Sealy, I.
    Slater, G.
    Smedley, D.
    Spudich, G.
    Trevanion, S.
    Vilella, A. J.
    Vogel, J.
    White, S.
    Wood, M.
    Birney, E.
    Cox, T.
    Curwen, V.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D707 - D714
  • [10] The real life of pseudogenes
    Gerstein, Mark
    Zheng, Deyou
    [J]. SCIENTIFIC AMERICAN, 2006, 295 (02) : 48 - 55