Quantitative comparison of EST libraries requires compensation for systematic biases in cDNA generation

被引:24
作者
Liu, DL [1 ]
Graber, JH [1 ]
机构
[1] Jackson Lab, Bar Harbor, ME 04609 USA
关键词
D O I
10.1186/1471-2105-7-77
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Publicly accessible EST libraries contain valuable information that can be utilized for studies of tissue-specific gene expression and processing of individual genes. This information is, however, confounded by multiple systematic effects arising from the procedures used to generate these libraries. Results: We used alignment of ESTs against a reference set of transcripts to estimate the size distributions of the cDNA inserts and sampled mRNA transcripts in individual EST libraries and show how these measurements can be used to inform quantitative comparisons of libraries. While significant attention has been paid to the effects of normalization and substraction, we also find significant biases in transcript sampling introduced by the combined procedures of reverse transcription and selection of cDNA clones for sequencing. Using examples drawn from studies of mRNA 3'-processing (cleavage and polyadenylation), we demonstrate effects of the transcript sampling bias, and provide a method for identifying libraries that can be safely compared without bias. All data sets, supplemental data, and software are available at our supplemental web site [1]. Conclusion: The biases we characterize in the transcript sampling of EST libraries represent a significant and heretofore under-appreciated source of false positive candidates for tissue-, cell type-, or developmental stage-specific activity or processing of genes. Uncorrected, quantitative comparison of dissimilar EST libraries will likely result in the identification of statistically significant, but biologically meaningless changes.
引用
收藏
页数:10
相关论文
共 43 条
[1]   SEQUENCE IDENTIFICATION OF 2,375 HUMAN BRAIN GENES [J].
ADAMS, MD ;
DUBNICK, M ;
KERLAVAGE, AR ;
MORENO, R ;
KELLEY, JM ;
UTTERBACK, TR ;
NAGLE, JW ;
FIELDS, C ;
VENTER, JC .
NATURE, 1992, 355 (6361) :632-634
[2]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[3]   The significance of digital gene expression profiles [J].
Audic, S ;
Claverie, JM .
GENOME RESEARCH, 1997, 7 (10) :986-995
[4]   Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data [J].
Beaudoing, E ;
Gautheret, D .
GENOME RESEARCH, 2001, 11 (09) :1520-1526
[5]   DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS [J].
BOGUSKI, MS ;
LOWE, TMJ ;
TOLSTOSHEV, CM .
NATURE GENETICS, 1993, 4 (04) :332-333
[6]   Normalization and subtraction: Two approaches to facilitate gene discovery [J].
Bonaldo, MDF ;
Lennon, G ;
Soares, MB .
GENOME RESEARCH, 1996, 6 (09) :791-806
[7]   1274 full-open reading frames of transcripts expressed in the developing mouse nervous system [J].
Bonaldo, MF ;
Bair, TB ;
Scheetz, TE ;
Snir, E ;
Akabogu, I ;
Bair, JL ;
Berger, B ;
Crouch, K ;
Davis, A ;
Eyestone, ME ;
Keppel, C ;
Kucaba, TA ;
Lebeck, M ;
Lin, JL ;
de Melo, AIR ;
Rehmann, J ;
Reiter, RS ;
Schaefer, K ;
Smith, C ;
Tack, D ;
Trout, K ;
Sheffield, VC ;
Lin, JJC ;
Casavant, TL ;
Soares, MB .
GENOME RESEARCH, 2004, 14 (10B) :2053-2063
[8]   PACdb:: PolyA cleavage site and 3′-UTR database [J].
Brockman, JM ;
Singh, P ;
Liu, DL ;
Quinlan, S ;
Salisbury, J ;
Graber, JH .
BIOINFORMATICS, 2005, 21 (18) :3691-3693
[9]   ExQuest, a novel method for displaying quantitative gene expression from ESTs [J].
Brown, AC ;
Kai, K ;
May, ME ;
Brown, DC ;
Roopenian, DC .
GENOMICS, 2004, 83 (03) :528-539
[10]   Alternative gene form discovery and candidate gene selection from gene indexing projects [J].
Burke, J ;
Wang, H ;
Hide, W ;
Davison, DB .
GENOME RESEARCH, 1998, 8 (03) :276-290