The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

被引:92
作者
Camargo, AA
Samaia, HPB
Dias-Neto, E
Simao, DF
Migotto, IA
Briones, MRS
Costa, FF
Nagai, MA
Verjovski-Almeida, S
Zago, MA
Andrade, LEC
Carrer, H
El-Dorry, HFA
Espreafico, EM
Habr-Gama, A
Giannella-Neto, D
Goldman, GH
Gruber, A
Hackel, C
Kimura, ET
Maciel, RMB
Marie, SKN
Martins, EAL
Nóbrega, MP
Paçó-Larson, ML
Pardini, MIMC
Pereira, GG
Pesquero, JB
Rodrigues, V
Rogatto, SR
da Silva, IDCG
Sogayar, MC
Sonati, MDF
Tajara, EH
Valentini, SR
Alberto, FL
Amaral, MEJ
Aneas, I
Arnaldi, LAT
de Assis, AM
Bengtson, MH
Bergamo, NA
Bombonato, V
de Camargo, MER
Canevari, RA
Carraro, DM
Cerutti, JM
Corrêa, MLC
Corrêa, RFR
Costa, MCR
机构
[1] Ludwig Inst Canc Res, BR-01509010 Sao Paulo, Brazil
[2] Univ Fed Sao Paulo, UNIFESP, Dept Reumatol, BR-04023062 Sao Paulo, Brazil
[3] Univ Fed Sao Paulo, UNIFESP, Dept Biofis, BR-04023062 Sao Paulo, Brazil
[4] Univ Fed Sao Paulo, UNIFESP, Escola Paulista Med, BR-04023062 Sao Paulo, Brazil
[5] Univ Estadual Campinas, Hemoctr, BR-13089970 Sao Paulo, Brazil
[6] Univ Sao Paulo, Fac Med, Dept Radiol, BR-01296903 Sao Paulo, Brazil
[7] Univ Sao Paulo, Inst Quim, Dept Bioquim, BR-05513970 Sao Paulo, Brazil
[8] Fac Med Ribeirao Preto, Dept Clin Med, BR-14049900 Sao Paulo, Brazil
[9] Fac Med Ribeirao Preto, Dept Biol Celular & Mol & Bioagentes Pathogenicos, BR-14049900 Sao Paulo, Brazil
[10] Fac Med Ribeirao Preto, Dept Bioquim & Imunol, BR-14049900 Sao Paulo, Brazil
[11] Univ Sao Paulo, Escola Super Agr Luiz de Queiroz, Dept Ciencias Biol, BR-13418900 Sao Paulo, Brazil
[12] Univ Sao Paulo, Fac Med, Inst Coracao, INCOR, BR-05403000 Sao Paulo, Brazil
[13] Univ Sao Paulo, Fac Med, Lab Nutr & Doencas Metab, BR-01246 Sao Paulo, Brazil
[14] Univ Sao Paulo, Fac Med, Dept Neurol, BR-01246 Sao Paulo, Brazil
[15] Univ Sao Paulo, Fac Ciencias Farmaceut Ribeirao Preto, Dept Ciencias Farmaceut, BR-14040903 Sao Paulo, Brazil
[16] Univ Sao Paulo, Fac Med Vet & Zootecn, Dept Patol, BR-05508000 Sao Paulo, Brazil
[17] Univ Estadual Campinas, Fac Ciencias Med, Dept Med Genet, BR-13081970 Sao Paulo, Brazil
[18] Univ Sao Paulo, Inst Ciencias Biomed, Dept Histol Embriol, BR-05508000 Sao Paulo, Brazil
[19] Univ Fed Sao Paulo, Dept Med, BR-04029032 Sao Paulo, Brazil
[20] Inst Butantan, Ctr Biotecnol, BR-05503900 Sao Paulo, Brazil
[21] Univ Vale Paraiba, Inst Pesquisa & Desenvolvimento, BR-12244 Sao Paulo, Brazil
[22] Univ Estadual Paulista, Fac Med Botucatu, Hemoctr, BR-18618000 Sao Paulo, Brazil
[23] Univ Estadual Campinas, Fac Ciencias Med, Inst Biol, Dept Genet & Evolucao, BR-13083970 Campinas, SP, Brazil
[24] Univ Estadual Campinas, Fac Ciencias Med, Dept Patol Clin, BR-13083970 Campinas, SP, Brazil
[25] Univ Estadual Paulista, Inst Biociencias, Dept Genet, BR-18618000 Sao Paulo, Brazil
[26] Escola Paulista Med, Dept Ginecol & Obstet, BR-04301900 Sao Paulo, Brazil
[27] Univ Estadual Paulista, Inst Biociencias Letras & Ciencias Exatas, Dept Biol, BR-15054 Sao Paulo, Brazil
[28] Univ Estadual Paulista, Fac Ciencias Farmaceut & Araraquara, Dept Ciencias Biol, BR-14801902 Sao Paulo, Brazil
[29] Hosp AC Camargo Fund Antonio Prudente, Dept Anat Patol, BR-01509010 Sao Paulo, Brazil
关键词
D O I
10.1073/pnas.201182798
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
引用
收藏
页码:12103 / 12108
页数:6
相关论文
共 20 条
  • [11] Generation and analysis of 280,000 human expressed sequence tags
    Hillier, L
    Lennon, G
    Becker, M
    Bonaldo, MF
    Chiapelli, B
    Chissoe, S
    Dietrich, N
    DuBuque, T
    Favello, A
    Gish, W
    Hawkins, M
    Hultman, M
    Kucaba, T
    Lacy, M
    Le, M
    Le, N
    Mardis, E
    Moore, B
    Morris, M
    Parsons, J
    Prange, C
    Rifkin, L
    Rohlfing, T
    Schellenberg, K
    Soares, MB
    Tan, F
    ThierryMeg, J
    Trevaskis, E
    Underwood, K
    Wohldman, P
    Waterston, R
    Wilson, R
    Marra, M
    [J]. GENOME RESEARCH, 1996, 6 (09) : 807 - 828
  • [12] Functional annotation of a full-length mouse cDNA collection
    Kawai, J
    Shinagawa, A
    Shibata, K
    Yoshino, M
    Itoh, M
    Ishii, Y
    Arakawa, T
    Hara, A
    Fukunishi, Y
    Konno, H
    Adachi, J
    Fukuda, S
    Aizawa, K
    Izawa, M
    Nishi, K
    Kiyosawa, H
    Kondo, S
    Yamanaka, I
    Saito, T
    Okazaki, Y
    Gojobori, T
    Bono, H
    Kasukawa, T
    Saito, R
    Kadota, K
    Matsuda, H
    Ashburner, M
    Batalov, S
    Casavant, T
    Fleischmann, W
    Gaasterland, T
    Gissi, C
    King, B
    Kochiwa, H
    Kuehl, P
    Lewis, S
    Matsuo, Y
    Nikaido, I
    Pesole, G
    Quackenbush, J
    Schriml, LM
    Staubli, F
    Suzuki, R
    Tomita, M
    Wagner, L
    Washio, T
    Sakai, K
    Okido, T
    Furuno, M
    Aono, H
    [J]. NATURE, 2001, 409 (6821) : 685 - 690
  • [13] HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project
    Kikuno, R
    Nagase, T
    Suyama, M
    Waki, M
    Hirosawa, M
    Ohara, O
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 331 - 332
  • [14] Initial sequencing and analysis of the human genome
    Lander, ES
    Int Human Genome Sequencing Consortium
    Linton, LM
    Birren, B
    Nusbaum, C
    Zody, MC
    Baldwin, J
    Devon, K
    Dewar, K
    Doyle, M
    FitzHugh, W
    Funke, R
    Gage, D
    Harris, K
    Heaford, A
    Howland, J
    Kann, L
    Lehoczky, J
    LeVine, R
    McEwan, P
    McKernan, K
    Meldrim, J
    Mesirov, JP
    Miranda, C
    Morris, W
    Naylor, J
    Raymond, C
    Rosetti, M
    Santos, R
    Sheridan, A
    Sougnez, C
    Stange-Thomann, N
    Stojanovic, N
    Subramanian, A
    Wyman, D
    Rogers, J
    Sulston, J
    Ainscough, R
    Beck, S
    Bentley, D
    Burton, J
    Clee, C
    Carter, N
    Coulson, A
    Deadman, R
    Deloukas, P
    Dunham, A
    Dunham, I
    Durbin, R
    French, L
    [J]. NATURE, 2001, 409 (6822) : 860 - 921
  • [15] Brazilian scientists team up for cancer genome project
    Neto, RB
    [J]. NATURE, 1999, 398 (6727) : 450 - 450
  • [16] The TIGR Gene Indices: reconstruction and representation of expressed gene sequences
    Quackenbush, J
    Liang, F
    Holt, I
    Pertea, G
    Upton, J
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 141 - 145
  • [17] The cancer genome anatomy project - Building an annotated gene index
    Strausberg, RL
    Buetow, KH
    Emmert-Buck, MR
    Klausner, RD
    [J]. TRENDS IN GENETICS, 2000, 16 (03) : 103 - 106
  • [18] The mammalian gene collection
    Strausberg, RL
    Feingold, EA
    Klausner, RD
    Collins, FS
    [J]. SCIENCE, 1999, 286 (5439) : 455 - 457
  • [19] The sequence of the human genome
    Venter, JC
    Adams, MD
    Myers, EW
    Li, PW
    Mural, RJ
    Sutton, GG
    Smith, HO
    Yandell, M
    Evans, CA
    Holt, RA
    Gocayne, JD
    Amanatides, P
    Ballew, RM
    Huson, DH
    Wortman, JR
    Zhang, Q
    Kodira, CD
    Zheng, XQH
    Chen, L
    Skupski, M
    Subramanian, G
    Thomas, PD
    Zhang, JH
    Miklos, GLG
    Nelson, C
    Broder, S
    Clark, AG
    Nadeau, C
    McKusick, VA
    Zinder, N
    Levine, AJ
    Roberts, RJ
    Simon, M
    Slayman, C
    Hunkapiller, M
    Bolanos, R
    Delcher, A
    Dew, I
    Fasulo, D
    Flanigan, M
    Florea, L
    Halpern, A
    Hannenhalli, S
    Kravitz, S
    Levy, S
    Mobarry, C
    Reinert, K
    Remington, K
    Abu-Threideh, J
    Beasley, E
    [J]. SCIENCE, 2001, 291 (5507) : 1304 - +
  • [20] The Merck Gene Index project
    Williamson, AR
    [J]. DRUG DISCOVERY TODAY, 1999, 4 (03) : 115 - 122