Complete sequencing and characterization of 21,243 full-length human cDNAs

被引:740
作者
Ota, T
Suzuki, Y
Nishikawa, T
Otsuki, T
Sugiyama, T
Irie, R
Wakamatsu, A
Hayashi, K
Sato, H
Nagai, K
Kimura, K
Makita, H
Sekine, M
Obayashi, M
Nishi, T
Shibahara, T
Tanaka, T
Ishii, S
Yamamoto, J
Saito, K
Kawai, Y
Isono, Y
Nakamura, Y
Nagahari, K
Murakami, K
Yasuda, T
Iwayanagi, T
Wagatsuma, M
Shiratori, A
Sudo, H
Hosoiri, T
Kaku, Y
Kodaira, H
Kondo, H
Sugawara, M
Takahashi, M
Kanda, K
Yokoi, T
Furuya, T
Kikkawa, E
Omura, Y
Abe, K
Kamihara, K
Katsuta, N
Sato, K
Tanikawa, M
Yamazaki, M
Ninomiya, K
Ishibashi, T
Yamashita, H
机构
[1] Univ Tokyo, Inst Med Sci, Minato Ku, Tokyo 1088639, Japan
[2] Helix Res Inst, Chiba 2920812, Japan
[3] Kyowa Hakko Kogyo Co Ltd, Tokyo Res Lab, Tokyo 1948533, Japan
[4] Hitachi Ltd, Cent Res Lab, Tokyo 1858601, Japan
[5] Natl Inst Technol & Evaluat, Shibuya Ku, Tokyo 1510066, Japan
[6] Otsuka Pharmaceut Co Ltd, Tokushima 7710192, Japan
[7] Hitachi, Life Sci Grp, Kawagoe, Saitama 3501165, Japan
[8] Hitachi Sci Syst, Tokyo 1858601, Japan
[9] Takara Shuzo Co Ltd, Shiga 5250055, Japan
[10] Nisshinbo Ind, Midori Ku, Chiba 2670056, Japan
[11] Toyobo, Fukui 9140047, Japan
[12] Fujiya, Kanagawa 2570031, Japan
[13] Aisin Cosmos Res & Dev Co Ltd, Chiba 2920812, Japan
[14] Kazusa DNA Res Inst, Chiba 2920812, Japan
[15] AIST, BIRC, Koto Ku, Tokyo 1350064, Japan
关键词
D O I
10.1038/ng1285
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at similar to58% compared with a peak at similar to42% for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at similar to42%, relatively low compared with that of protein-coding cDNAs.
引用
收藏
页码:40 / 45
页数:6
相关论文
共 29 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] THE ISOCHORE ORGANIZATION OF THE HUMAN GENOME AND ITS EVOLUTIONARY HISTORY - A REVIEW
    BERNARDI, G
    [J]. GENE, 1993, 135 (1-2) : 57 - 66
  • [3] THE TURNING-POINT IN GENOME RESEARCH
    BOGUSKI, MS
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1995, 20 (08) : 295 - 296
  • [4] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [5] Structural genomics: A pipeline for providing structures for the biologist
    Chance, MR
    Bresnick, AR
    Burley, SK
    Jiang, JS
    Lima, CD
    Sali, A
    Almo, SC
    Bonanno, JB
    Buglino, JA
    Boulton, S
    Chen, H
    Eswar, N
    He, GS
    Huang, R
    Ilyin, V
    McMahan, L
    Pieper, U
    Ray, S
    Vidal, M
    Wang, LK
    [J]. PROTEIN SCIENCE, 2002, 11 (04) : 723 - 738
  • [6] The DNA sequence and comparative analysis of human chromosome 20
    Deloukas, P
    Matthews, LH
    Ashurst, J
    Burton, J
    Gilbert, JGR
    Jones, M
    Stavrides, G
    Almeida, JP
    Babbage, AK
    Bagguley, CL
    Bailey, J
    Barlow, KF
    Bates, KN
    Beard, LM
    Beare, DM
    Beasley, OP
    Bird, CP
    Blakey, SE
    Bridgeman, AM
    Brown, AJ
    Buck, D
    Burrill, W
    Butler, AP
    Carder, C
    Carter, NP
    Chapman, JC
    Clamp, M
    Clark, G
    Clark, LN
    Clark, SY
    Clee, CM
    Clegg, S
    Cobley, VE
    Collier, RE
    Connor, R
    Corby, NR
    Coulson, A
    Coville, GJ
    Deadman, R
    Dhami, P
    Dunn, M
    Ellington, AG
    Frankland, JA
    Fraser, A
    French, L
    Garner, P
    Grafham, DV
    Griffiths, C
    Griffiths, ND
    Gwilliam, R
    [J]. NATURE, 2001, 414 (6866) : 865 - U3
  • [7] The DNA sequence of human chromosome 22
    Dunham, I
    Shimizu, N
    Roe, BA
    Chissoe, S
    Dunham, I
    Hunt, AR
    Collins, JE
    Bruskiewich, R
    Beare, DM
    Clamp, M
    Smink, LJ
    Ainscough, R
    Almeida, JP
    Babbage, A
    Bagguley, C
    Balley, J
    Barlow, K
    Bates, KN
    Beasley, O
    Bird, CP
    Blakey, S
    Bridgeman, AM
    Buck, D
    Burgess, J
    Burrill, WD
    Burton, J
    Carder, C
    Carter, NP
    Chen, Y
    Clark, G
    Clegg, SM
    Cobley, V
    Cole, CG
    Collier, RE
    Connor, RE
    Conroy, D
    Corby, N
    Coville, GJ
    Cox, AV
    Davis, J
    Dawson, E
    Dhami, PD
    Dockree, C
    Dodsworth, SJ
    Durbin, RM
    Ellington, A
    Evans, KL
    Fey, JM
    Fleming, K
    French, L
    [J]. NATURE, 1999, 402 (6761) : 489 - 495
  • [8] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185
  • [9] Fickett JW, 1998, METHOD BIOCHEM ANAL, V39, P231
  • [10] Functional organization of the yeast proteome by systematic analysis of protein complexes
    Gavin, AC
    Bösche, M
    Krause, R
    Grandi, P
    Marzioch, M
    Bauer, A
    Schultz, J
    Rick, JM
    Michon, AM
    Cruciat, CM
    Remor, M
    Höfert, C
    Schelder, M
    Brajenovic, M
    Ruffner, H
    Merino, A
    Klein, K
    Hudak, M
    Dickson, D
    Rudi, T
    Gnau, V
    Bauch, A
    Bastuck, S
    Huhse, B
    Leutwein, C
    Heurtier, MA
    Copley, RR
    Edelmann, A
    Querfurth, E
    Rybin, V
    Drewes, G
    Raida, M
    Bouwmeester, T
    Bork, P
    Seraphin, B
    Kuster, B
    Neubauer, G
    Superti-Furga, G
    [J]. NATURE, 2002, 415 (6868) : 141 - 147