The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression

被引:3978
作者
Derrien, Thomas [1 ,2 ]
Johnson, Rory [1 ,2 ]
Bussotti, Giovanni [1 ,2 ]
Tanzer, Andrea [1 ,2 ]
Djebali, Sarah [1 ,2 ]
Tilgner, Hagen [1 ,2 ]
Guernec, Gregory [3 ]
Martin, David [1 ,2 ]
Merkel, Angelika [1 ,2 ]
Knowles, David G. [1 ,2 ]
Lagarde, Julien [1 ,2 ]
Veeravalli, Lavanya [4 ]
Ruan, Xiaoan [4 ]
Ruan, Yijun [4 ]
Lassmann, Timo [5 ]
Carninci, Piero [5 ]
Brown, James B. [6 ]
Lipovich, Leonard [7 ]
Gonzalez, Jose M. [8 ]
Thomas, Mark [8 ]
Davis, Carrie A. [9 ]
Shiekhattar, Ramin [10 ]
Gingeras, Thomas R. [9 ]
Hubbard, Tim J. [8 ]
Notredame, Cedric [1 ,2 ]
Harrow, Jennifer [8 ]
Guigo, Roderic [1 ,2 ,11 ]
机构
[1] Ctr Genom Regulat CRG, Barcelona 08003, Catalonia, Spain
[2] UPF, Barcelona 08003, Catalonia, Spain
[3] GenOuest, IFR140, SCRIBE UR1012, INRA, F-35000 Rennes, France
[4] Agcy Sci Technol & Res, Genome Inst Singapore, Genome 138672, Singapore
[5] Riken Yokohama Inst, Riken Omics Sci Ctr, Yokohama, Kanagawa 3510198, Japan
[6] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[7] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI 48201 USA
[8] Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England
[9] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[10] Wistar Inst Anat & Biol, Philadelphia, PA 19104 USA
[11] Univ Pompeu Fabra, Dept Ciencies Expt & Salut, Barcelona 08002, Catalonia, Spain
基金
美国国家卫生研究院;
关键词
MESSENGER-RNA; IDENTIFICATION; TRANSCRIPTION; ANNOTATION; DATABASE; REVEALS; PRODUCT;
D O I
10.1101/gr.132159.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The human genome contains many thousands of long noncoding RNAs (IncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive IncRNA annotation. Here, we present and analyze the most complete human IncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that IncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon /intron lengths. In contrast to protein-coding genes, however, IncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that IncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific IncRNAs expressed in the brain. Expression correlation analysis indicates that IncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of IncRNAs.
引用
收藏
页码:1775 / 1789
页数:15
相关论文
共 58 条
  • [1] U12DB: a database of orthologous U12-type spliceosomal introns
    Alioto, Tyler S.
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D110 - D115
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] lncRNAdb: a reference database for long noncoding RNAs
    Amaral, Paulo P.
    Clark, Michael B.
    Gascoigne, Dennis K.
    Dinger, Marcel E.
    Mattick, John S.
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D146 - D151
  • [4] [Anonymous], 2007, CURR PROTOC BIOINFOR
  • [5] A thymus-specific noncoding RNA, Thy-ncR1, is a cytoplasmic riboregulator of MFAP4 mRNA in immature T-cell lines
    Aoki, Kazuma
    Harashima, Akira
    Sano, Miho
    Yokoi, Takahide
    Nakamura, Shuji
    Kibata, Masayoshi
    Hirose, Tetsuro
    [J]. BMC MOLECULAR BIOLOGY, 2010, 11
  • [6] SNORD-host RNA Zfas1 is a regulator of mammary development and a potential marker for breast cancer
    Askarian-Amiri, Marjan E.
    Crawford, Joanna
    French, Juliet D.
    Smart, Chanel E.
    Smith, Martin A.
    Clark, Michael B.
    Ru, Kelin
    Mercer, Tim R.
    Thompson, Ella R.
    Lakhani, Sunil R.
    Vargas, Ana C.
    Campbell, Ian G.
    Brown, Melissa A.
    Dinger, Marcel E.
    Mattick, John S.
    [J]. RNA, 2011, 17 (05) : 878 - 891
  • [7] Long noncoding RNAs are rarely translated in two human cell lines
    Banfai, Balazs
    Jia, Hui
    Khatun, Jainab
    Wood, Emily
    Risk, Brian
    Gundling, William E., Jr.
    Kundaje, Anshul
    Gunawardena, Harsha P.
    Yu, Yanbao
    Xie, Ling
    Krajewski, Krzysztof
    Strahl, Brian D.
    Chen, Xian
    Bickel, Peter
    Giddings, Morgan C.
    Brown, James B.
    Lipovich, Leonard
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1646 - 1657
  • [8] High-resolution profiling of histone methylations in the human genome
    Barski, Artern
    Cuddapah, Suresh
    Cui, Kairong
    Roh, Tae-Young
    Schones, Dustin E.
    Wang, Zhibin
    Wei, Gang
    Chepelev, Iouri
    Zhao, Keji
    [J]. CELL, 2007, 129 (04) : 823 - 837
  • [9] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [10] THE PRODUCT OF THE H19 GENE MAY FUNCTION AS AN RNA
    BRANNAN, CI
    DEES, EC
    INGRAM, RS
    TILGHMAN, SM
    [J]. MOLECULAR AND CELLULAR BIOLOGY, 1990, 10 (01) : 28 - 36