NCBI Reference Sequences: current status, policy and new initiatives

被引:568
作者
Pruitt, Kim D. [1 ]
Tatusova, Tatiana [1 ]
Klimke, William [1 ]
Maglott, Donna R. [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20892 USA
关键词
ALIGNMENT; DATABASE; ENTREZ;
D O I
10.1093/nar/gkn721
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 10(6) proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.
引用
收藏
页码:D32 / D36
页数:5
相关论文
共 13 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] BENSON DA, 2009, NUCL ACIDS IN PRESS
  • [4] A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure
    Eddy, SR
    [J]. BMC BIOINFORMATICS, 2002, 3 (1)
  • [5] Rfam: annotating non-coding RNAs in complete genomes
    Griffiths-Jones, S
    Moxon, S
    Marshall, M
    Khanna, A
    Eddy, SR
    Bateman, A
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D121 - D124
  • [6] Gulley ML, 2007, ARCH PATHOL LAB MED, V131, P852
  • [7] Splign: algorithms for computing spliced alignments with identification of paralogs
    Kapustin, Yuri
    Souvorov, Alexander
    Tatusova, Tatiana
    Lipman, David
    [J]. BIOLOGY DIRECT, 2008, 3 (1)
  • [8] KLIMKE W, 2009, NUCL ACIDS IN PRESS
  • [9] tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence
    Lowe, TM
    Eddy, SR
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (05) : 955 - 964
  • [10] Entrez Gene: gene-centered information at NCBI
    Maglott, Donna
    Ostell, Jim
    Pruitt, Kim D.
    Tatusova, Tatiana
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D26 - D31