CDS annotation in full-length cDNA sequence

被引:59
作者
Furuno, M
Kasukawa, T
Saito, R
Adachi, J
Suzuki, H
Baldarelli, R
Hayashizaki, Y
Okazaki, Y
机构
[1] RIKEN, Yokohama Inst, GSC, Lab Genome Explorat Res Grp,Tsurumi Ku, Kanagawa 2300045, Japan
[2] NTT Software Corp, Multimedia Dev Ctr, Adv Technol Dev Dept, Naka Ku, Kanagawa 2318554, Japan
[3] Keio Univ, Inst Adv Biosci, Tsuruoka, Yamagata 9970017, Japan
[4] Jackson Lab, Mouse Genome Informat Grp, Bar Harbor, ME 04609 USA
[5] RIKEN, Genome Sci Lab, Wako, Saitama 3510198, Japan
关键词
D O I
10.1101/gr.1060303
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification of coding sequences (CDS) is an important step in the functional annotation of genes. CDs prediction for mammalian genes from genomic sequence is complicated by the vast abundance of intergenic sequence in the genome, and provides little information about how different parts of potential CDS regions are expressed. In contrast, mammalian gene CDS prediction from cDNA sequence offers obvious advantages, yet encounters a different set of complexities when performed on high-throughput cDNA (HTC) sequences, such as the set of 60,770 cDNAs isolated from full-length enriched libraries of the FANTOM2 project. We developed a CDS annotation strategy that uses a variety of different CDS prediction programs to annotate the CDS regions of FANTOM2 cDNAs. These include rsCDS, which uses sequence similarity to known proteins; ProCrest; Longest-ORF and Truncated-ORF, which are ab initio based predictors; and finally, DECODER and NCBI CDs predictor, which use a combination of both principles. Aided by graphical displays of these CDS prediction results in the context of other sequence similarity results for each cDNA, FANTOM2 CDS inspection by curators and follow-up quality control procedures resulted in high quality CDS predictions for a total of 14,345 FANTOM2 clones.
引用
收藏
页码:1478 / 1487
页数:10
相关论文
共 10 条
  • [1] RECOGNITION OF UGA AS A SELENOCYSTEINE CODON IN TYPE-I DEIODINASE REQUIRES SEQUENCES IN THE 3' UNTRANSLATED REGION
    BERRY, MJ
    BANU, L
    CHEN, Y
    MANDEL, SJ
    KIEFFER, JD
    HARNEY, JW
    LARSEN, PR
    [J]. NATURE, 1991, 353 (6341) : 273 - 276
  • [2] Amino acid translation program for full-length cDNA sequences with frameshift errors
    Fukunishi, Y
    Hayashizaki, Y
    [J]. PHYSIOLOGICAL GENOMICS, 2001, 5 (02) : 81 - 87
  • [3] GRIMMOND SM, 2003, GENOME RES
  • [4] A perfect message: RNA surveillance and nonsense-mediated decay
    Hentze, MW
    Kulozik, AE
    [J]. CELL, 1999, 96 (03) : 307 - 310
  • [5] KASUKAWA T, 2003, GENOME RES
  • [6] Functional annotation of a full-length mouse cDNA collection
    Kawai, J
    Shinagawa, A
    Shibata, K
    Yoshino, M
    Itoh, M
    Ishii, Y
    Arakawa, T
    Hara, A
    Fukunishi, Y
    Konno, H
    Adachi, J
    Fukuda, S
    Aizawa, K
    Izawa, M
    Nishi, K
    Kiyosawa, H
    Kondo, S
    Yamanaka, I
    Saito, T
    Okazaki, Y
    Gojobori, T
    Bono, H
    Kasukawa, T
    Saito, R
    Kadota, K
    Matsuda, H
    Ashburner, M
    Batalov, S
    Casavant, T
    Fleischmann, W
    Gaasterland, T
    Gissi, C
    King, B
    Kochiwa, H
    Kuehl, P
    Lewis, S
    Matsuo, Y
    Nikaido, I
    Pesole, G
    Quackenbush, J
    Schriml, LM
    Staubli, F
    Suzuki, R
    Tomita, M
    Wagner, L
    Washio, T
    Sakai, K
    Okido, T
    Furuno, M
    Aono, H
    [J]. NATURE, 2001, 409 (6821) : 685 - 690
  • [7] A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance
    Nagy, E
    Maquat, LE
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (06) : 198 - 199
  • [8] Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs
    Okazaki, Y
    Furuno, M
    Kasukawa, T
    Adachi, J
    Bono, H
    Kondo, S
    Nikaido, I
    Osato, N
    Saito, R
    Suzuki, H
    Yamanaka, I
    Kiyosawa, H
    Yagi, K
    Tomaru, Y
    Hasegawa, Y
    Nogami, A
    Schönbach, C
    Gojobori, T
    Baldarelli, R
    Hill, DP
    Bult, C
    Hume, DA
    Quackenbush, J
    Schriml, LM
    Kanapin, A
    Matsuda, H
    Batalov, S
    Beisel, KW
    Blake, JA
    Bradt, D
    Brusic, V
    Chothia, C
    Corbani, LE
    Cousins, S
    Dalla, E
    Dragani, TA
    Fletcher, CF
    Forrest, A
    Frazer, KS
    Gaasterland, T
    Gariboldi, M
    Gissi, C
    Godzik, A
    Gough, J
    Grimmond, S
    Gustincich, S
    Hirokawa, N
    Jackson, IJ
    Jarvis, ED
    Kanai, A
    [J]. NATURE, 2002, 420 (6915) : 563 - 573
  • [9] Binary specification of nonsense codons by splicing and cytoplasmic translation
    Thermann, R
    Neu-Yilik, G
    Deters, A
    Frede, U
    Wehr, K
    Hagemeier, C
    Hentze, MW
    Kulozik, AE
    [J]. EMBO JOURNAL, 1998, 17 (12) : 3484 - 3494
  • [10] Initial sequencing and comparative analysis of the mouse genome
    Waterston, RH
    Lindblad-Toh, K
    Birney, E
    Rogers, J
    Abril, JF
    Agarwal, P
    Agarwala, R
    Ainscough, R
    Alexandersson, M
    An, P
    Antonarakis, SE
    Attwood, J
    Baertsch, R
    Bailey, J
    Barlow, K
    Beck, S
    Berry, E
    Birren, B
    Bloom, T
    Bork, P
    Botcherby, M
    Bray, N
    Brent, MR
    Brown, DG
    Brown, SD
    Bult, C
    Burton, J
    Butler, J
    Campbell, RD
    Carninci, P
    Cawley, S
    Chiaromonte, F
    Chinwalla, AT
    Church, DM
    Clamp, M
    Clee, C
    Collins, FS
    Cook, LL
    Copley, RR
    Coulson, A
    Couronne, O
    Cuff, J
    Curwen, V
    Cutts, T
    Daly, M
    David, R
    Davies, J
    Delehaunty, KD
    Deri, J
    Dermitzakis, ET
    [J]. NATURE, 2002, 420 (6915) : 520 - 562