Genome-wide computational identification and manual annotation of human long noncoding RNA genes

被引:304
作者
Jia, Hui [1 ]
Osak, Maureen [2 ]
Bogu, Gireesh K. [3 ]
Stanton, Lawrence W. [3 ]
Johnson, Rory [3 ]
Lipovich, Leonard [1 ]
机构
[1] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI 48202 USA
[2] Hillsdale Coll, Lee & Roland Witte Nat Sci Div, Hillsdale, MI 49242 USA
[3] Genome Inst Singapore, Stem Cell & Dev Biol Grp, Singapore 138672, Singapore
关键词
lncRNA; noncoding RNA; transcriptome; hypothetical protein; CPC; ORF-Predictor; DATABASE; EXPRESSION; REVEALS; LOCI;
D O I
10.1261/rna.1951310
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from; 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
引用
收藏
页码:1478 / 1487
页数:10
相关论文
共 31 条
  • [1] The transcriptional landscape of the mammalian genome
    Carninci, P
    Kasukawa, T
    Katayama, S
    Gough, J
    Frith, MC
    Maeda, N
    Oyama, R
    Ravasi, T
    Lenhard, B
    Wells, C
    Kodzius, R
    Shimokawa, K
    Bajic, VB
    Brenner, SE
    Batalov, S
    Forrest, ARR
    Zavolan, M
    Davis, MJ
    Wilming, LG
    Aidinis, V
    Allen, JE
    Ambesi-Impiombato, X
    Apweiler, R
    Aturaliya, RN
    Bailey, TL
    Bansal, M
    Baxter, L
    Beisel, KW
    Bersano, T
    Bono, H
    Chalk, AM
    Chiu, KP
    Choudhary, V
    Christoffels, A
    Clutterbuck, DR
    Crowe, ML
    Dalla, E
    Dalrymple, BP
    de Bono, B
    Della Gatta, G
    di Bernardo, D
    Down, T
    Engstrom, P
    Fagiolini, M
    Faulkner, G
    Fletcher, CF
    Fukushima, T
    Furuno, M
    Futaki, S
    Gariboldi, M
    [J]. SCIENCE, 2005, 309 (5740) : 1559 - 1563
  • [2] Noncoding RNA transcription beyond annotated genes
    Carninci, Piero
    Hayashizaki, Yoshihide
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2007, 17 (02) : 139 - 144
  • [3] Distinguishing protein-coding and noncoding genes in the human genome
    Clamp, Michele
    Fry, Ben
    Kamal, Mike
    Xie, Xiaohui
    Cuff, James
    Lin, Michael F.
    Kellis, Manolis
    Lindblad-Toh, Kerstin
    Lander, Eric S.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) : 19428 - 19433
  • [4] Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation
    Dinger, Marcel E.
    Amaral, Paulo P.
    Mercer, Tim R.
    Pang, Ken C.
    Bruce, Stephen J.
    Gardiner, Brooke B.
    Askarian-Amiri, Marjan E.
    Ru, Kelin
    Solda, Giulia
    Simons, Cas
    Sunkin, Susan M.
    Crowe, Mark L.
    Grimmond, Sean M.
    Perkins, Andrew C.
    Mattick, John S.
    [J]. GENOME RESEARCH, 2008, 18 (09) : 1433 - 1445
  • [5] NRED: a database of long noncoding RNA expression
    Dinger, Marcel E.
    Pang, Ken C.
    Mercer, Tim R.
    Crowe, Mark L.
    Grimmond, Sean M.
    Mattick, John S.
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D122 - D126
  • [6] Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities
    Dinger, Marcel E.
    Pang, Ken C.
    Mercer, Tim R.
    Mattick, John S.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (11)
  • [7] Complex loci in human and mouse genomes
    Engstrom, Par G.
    Suzuki, Harukazu
    Ninomiya, Noriko
    Akalin, Altuna
    Sessa, Luca
    Lavorgna, Giovanni
    Brozzi, Alessandro
    Luzi, Lucilla
    Tan, Sin Lam
    Yang, Liang
    Kunarso, Galih
    ng, Edwin Lian-Cho Ng
    Batalov, Serge
    Wahlestedt, Claes
    Kai, Chikatoshi
    Kawai, Jun
    Carninci, Piero
    Hayashizaki, Yoshihide
    Wells, Christine
    Bajic, Vladimir B.
    Orlando, Valerio
    Reid, James F.
    Lenhard, Boris
    Lipovich, Leonard
    [J]. PLOS GENETICS, 2006, 2 (04): : 564 - 577
  • [8] Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of β-secretase
    Faghihi, Mohammad Ali
    Modarresi, Farzaneh
    Khalil, Ahmad M.
    Wood, Douglas E.
    Sahagan, Barbara G.
    Morgan, Todd E.
    Finch, Caleb E.
    Laurent, Georges St., III
    Kenny, Paul J.
    Wahlestedt, Claes
    [J]. NATURE MEDICINE, 2008, 14 (07) : 723 - 730
  • [9] A noncoding RNA is a potential marker of cell fate during mammary gland development
    Ginger, MR
    Shore, AN
    Contreras, A
    Rijnkels, M
    Miller, J
    Gonzalez-Rimbau, MF
    Rosen, JM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) : 5781 - 5786
  • [10] miRBase: tools for microRNA genomics
    Griffiths-Jones, Sam
    Saini, Harpreet Kaur
    van Dongen, Stijn
    Enright, Anton J.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D154 - D158