Computational analysis of gene identification with SAGE

被引:11
作者
Clark, T
Lee, S
Scott, LR
Wang, SM
机构
[1] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Med, Chicago, IL 60637 USA
关键词
SAGE; GLGI; gene identification; gene expression; sequence distribution;
D O I
10.1089/106652702760138600
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
SAGE is one of the few techniques capable of uniformly probing gene expression at a genome level irrespective of mRNA abundance and without a priori knowledge of the transcripts present. However, individual SAGE tags can match many sequences in the reference database, complicating gene identification. We perform a baseline evaluation of gene identification with SAGE using UniGene Human as the reference database by analyzing 1) the distributions of tags for various length tag sets formed for UniGene Human and 2) the tag-to-sequence mapping using a SAGE tag set consisting of 37,522 tags derived from human myeloid cells. The extensive multiplicity of the dbEST component of UniGene significantly detracts from gains that might be expected by extending tags within the scope of the SAGE protocol. In order to achieve reasonable sequence specificity for gene identification with the content of the commonly used UniGene sequence collection, tags on the order of hundreds of bases in length are required. One way to produce tags of such lengths is with GLGI, which extends SAGE tags to the 3' end of cDNA. We show that the longer sequences produced by GLGI relieve significantly the multiple match condition. In the myeloid sample, we also found a correlation between multiple match severity and high copy number. We extrapolate these findings, providing insights into the use of UniGene Human as a reference for gene identification.
引用
收藏
页码:513 / 526
页数:14
相关论文
共 19 条
[1]   Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data [J].
Aaronson, JS ;
Eckman, B ;
Blevins, RA ;
Borkowski, JA ;
Myerson, J ;
Imran, S ;
Elliston, KO .
GENOME RESEARCH, 1996, 6 (09) :829-845
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   POLYMERASE CHAIN-REACTION STRATEGY [J].
ARNHEIM, N ;
ERLICH, H .
ANNUAL REVIEW OF BIOCHEMISTRY, 1992, 61 :131-156
[4]   Normalization and subtraction: Two approaches to facilitate gene discovery [J].
Bonaldo, MDF ;
Lennon, G ;
Soares, MB .
GENOME RESEARCH, 1996, 6 (09) :791-806
[5]   The human adult skeletal muscle transcriptional profile reconstructed by a novel computational approach [J].
Bortoluzzi, S ;
d'Alessi, F ;
Romualdi, C ;
Danieli, GA .
GENOME RESEARCH, 2000, 10 (03) :344-349
[6]   Generation of longer cDNA fragments from serial analysis of gene expression tags for gene identification [J].
Chen, JJ ;
Rowley, JD ;
Wang, SM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :349-353
[7]   Generation and analysis of 280,000 human expressed sequence tags [J].
Hillier, L ;
Lennon, G ;
Becker, M ;
Bonaldo, MF ;
Chiapelli, B ;
Chissoe, S ;
Dietrich, N ;
DuBuque, T ;
Favello, A ;
Gish, W ;
Hawkins, M ;
Hultman, M ;
Kucaba, T ;
Lacy, M ;
Le, M ;
Le, N ;
Mardis, E ;
Moore, B ;
Morris, M ;
Parsons, J ;
Prange, C ;
Rifkin, L ;
Rohlfing, T ;
Schellenberg, K ;
Soares, MB ;
Tan, F ;
ThierryMeg, J ;
Trevaskis, E ;
Underwood, K ;
Wohldman, P ;
Waterston, R ;
Wilson, R ;
Marra, M .
GENOME RESEARCH, 1996, 6 (09) :807-828
[8]  
Lal A, 1999, CANCER RES, V59, P5403
[9]   SAGEmap: A public gene expression resource [J].
Lash, AE ;
Tolstoshev, CM ;
Wagner, L ;
Schuler, GD ;
Strausberg, RL ;
Riggins, GJ ;
Altschul, SF .
GENOME RESEARCH, 2000, 10 (07) :1051-1060
[10]   The pattern of gene expression in human CD15+ myeloid progenitor cells [J].
Lee, S ;
Zhou, GL ;
Clark, T ;
Chen, JJ ;
Rowley, JD ;
Wang, SM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (06) :3340-3345