NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

被引:872
作者
Pruitt, Kim D. [1 ]
Tatusova, Tatiana [1 ]
Brown, Garth R. [1 ]
Maglott, Donna R. [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
RESOURCES; DATABASE;
D O I
10.1093/nar/gkr1079
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 x 10(6) genomic records, 13 x 10(6) proteins and 2 x 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).
引用
收藏
页码:D130 / D135
页数:6
相关论文
共 14 条
[1]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[2]   Modernizing Reference Genome Assemblies [J].
Church, Deanna M. ;
Schneider, Valerie A. ;
Graves, Tina ;
Auger, Katherine ;
Cunningham, Fiona ;
Bouk, Nathan ;
Chen, Hsiu-Chuan ;
Agarwala, Richa ;
McLaren, William M. ;
Ritchie, Graham R. S. ;
Albracht, Derek ;
Kremitzki, Milinn ;
Rock, Susan ;
Kotkiewicz, Holland ;
Kremitzki, Colin ;
Wollam, Aye ;
Trani, Lee ;
Fulton, Lucinda ;
Fulton, Robert ;
Matthews, Lucy ;
Whitehead, Siobhan ;
Chow, Will ;
Torrance, James ;
Dunn, Matthew ;
Harden, Glenn ;
Threadgold, Glen ;
Wood, Jonathan ;
Collins, Joanna ;
Heath, Paul ;
Griffiths, Guy ;
Pelan, Sarah ;
Grafham, Darren ;
Eichler, Evan E. ;
Weinstock, George ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
Howe, Kerstin ;
Flicek, Paul ;
Hubbard, Tim .
PLOS BIOLOGY, 2011, 9 (07)
[3]   Locus Reference Genomic sequences: an improved basis for describing human DNA variants [J].
Dalgleish, Raymond ;
Flicek, Paul ;
Cunningham, Fiona ;
Astashyn, Alex ;
Tully, Raymond E. ;
Proctor, Glenn ;
Chen, Yuan ;
McLaren, William M. ;
Larsson, Pontus ;
Vaughan, Brendan W. ;
Beroud, Christophe ;
Dobson, Glen ;
Lehvaeslaiho, Heikki ;
Taschner, Peter E. M. ;
den Dunnen, Johan T. ;
Devereau, Andrew ;
Birney, Ewan ;
Brookes, Anthony J. ;
Maglott, Donna R. .
GENOME MEDICINE, 2010, 2
[4]   Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation [J].
Karro, John E. ;
Yan, Yangpan ;
Zheng, Deyou ;
Zhang, Zhaolei ;
Carriero, Nicholas ;
Cayting, Philip ;
Harrrison, Paul ;
Gerstein, Mark .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D55-D60
[5]   miRBase: integrating microRNA annotation and deep-sequencing data [J].
Kozomara, Ana ;
Griffiths-Jones, Sam .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D152-D157
[6]   Entrez Gene: gene-centered information at NCBI [J].
Maglott, Donna ;
Ostell, Jim ;
Pruitt, Kim D. ;
Tatusova, Tatiana .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D52-D57
[7]   CDD: a Conserved Domain Database for the functional annotation of proteins [J].
Marchler-Bauer, Aron ;
Lu, Shennan ;
Anderson, John B. ;
Chitsaz, Farideh ;
Derbyshire, Myra K. ;
DeWeese-Scott, Carol ;
Fong, Jessica H. ;
Geer, Lewis Y. ;
Geer, Renata C. ;
Gonzales, Noreen R. ;
Gwadz, Marc ;
Hurwitz, David I. ;
Jackson, John D. ;
Ke, Zhaoxi ;
Lanczycki, Christopher J. ;
Lu, Fu ;
Marchler, Gabriele H. ;
Mullokandov, Mikhail ;
Omelchenko, Marina V. ;
Robertson, Cynthia L. ;
Song, James S. ;
Thanki, Narmada ;
Yamashita, Roxanne A. ;
Zhang, Dachuan ;
Zhang, Naigong ;
Zheng, Chanjuan ;
Bryant, Stephen H. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D225-D229
[8]   SignalP 4.0: discriminating signal peptides from transmembrane regions [J].
Petersen, Thomas Nordahl ;
Brunak, Soren ;
von Heijne, Gunnar ;
Nielsen, Henrik .
NATURE METHODS, 2011, 8 (10) :785-786
[9]   Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes [J].
Prakash, Tulika ;
Sharma, Vineet K. ;
Adati, Naoki ;
Ozawa, Ritsuko ;
Kumar, Naveen ;
Nishida, Yuichiro ;
Fujikake, Takayoshi ;
Takeda, Tadayuki ;
Taylor, Todd D. .
PLOS ONE, 2010, 5 (10)
[10]   Introducing RefSeq and LocusLink: curated human genome resources at the NCBI [J].
Pruitt, KD ;
Katz, KS ;
Sicotte, H ;
Maglott, DR .
TRENDS IN GENETICS, 2000, 16 (01) :44-47