A multivariate prediction model for microarray cross-hybridization

被引:55
作者
Chen, YA [1 ]
Chou, CC
Lu, XH
Slate, EH
Peck, K
Wu, WY
Voit, EO
Almeida, JS
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Charleston, SC 29425 USA
[2] Natl Taiwan Univ, Ctr Genom Med, Taipei 10764, Taiwan
[3] Acad Sinica, Inst Biomed Sci, Taipei, Taiwan
[4] Chinese Acad Sci, Inst Genet & Dev Biol, Key Lab Mol & Dev Biol, Beijing, Peoples R China
[5] Georgia Tech, Dept Biomed Engn, Atlanta, GA USA
[6] Univ Texas, MD Anderson Canc Ctr, Dept Biostat & Appl Math, Houston, TX 77030 USA
关键词
D O I
10.1186/1471-2105-7-101
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-)hybridization. Results: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. Conclusion: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross- hybridization signals in expression experiments.
引用
收藏
页数:12
相关论文
共 59 条
[1]   Predictive non-linear modeling of complex data by artificial neural networks [J].
Almeida, JS .
CURRENT OPINION IN BIOTECHNOLOGY, 2002, 13 (01) :72-76
[2]   Standardizing global gene expression analysis between laboratories and across platforms [J].
Bammler, T ;
Beyer, RP ;
Bhattacharya, S ;
Boorman, GA ;
Boyles, A ;
Bradford, BU ;
Bumgarner, RE ;
Bushel, PR ;
Chaturvedi, K ;
Choi, D ;
Cunningham, ML ;
Dengs, S ;
Dressman, HK ;
Fannin, RD ;
Farun, FM ;
Freedman, JH ;
Fry, RC ;
Harper, A ;
Humble, MC ;
Hurban, P ;
Kavanagh, TJ ;
Kaufmann, WK ;
Kerr, KF ;
Jing, L ;
Lapidus, JA ;
Lasarev, MR ;
Li, J ;
Li, YJ ;
Lobenhofer, EK ;
Lu, X ;
Malek, RL ;
Milton, S ;
Nagalla, SR ;
O'Malley, JP ;
Palmer, VS ;
Pattee, P ;
Paules, RS ;
Perou, CM ;
Phillips, K ;
Qin, LX ;
Qiu, Y ;
Quigley, SD ;
Rodland, M ;
Rusyn, I ;
Samson, LD ;
Schwartz, DA ;
Shi, Y ;
Shin, JL ;
Sieber, SO ;
Slifer, S .
NATURE METHODS, 2005, 2 (05) :351-356
[3]  
Breiman L., 1998, CLASSIFICATION REGRE
[4]   Neural networks with a continuous squashing function in the output are universal approximators [J].
Castro, JL ;
Mantes, CJ ;
Benítez, JM .
NEURAL NETWORKS, 2000, 13 (06) :561-563
[5]   Probe rank approaches for gene selection in oligonucleotide arrays with a small number of replicates [J].
Chen, DT ;
Chen, JJ ;
Soong, SJ .
BIOINFORMATICS, 2005, 21 (12) :2861-2866
[6]   Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information [J].
Chen, YA ;
Mckillen, DJ ;
Wu, SY ;
Jenny, MJ ;
Chapman, R ;
Gross, PS ;
Warr, GW ;
Almeida, JS .
BMC BIOINFORMATICS, 2004, 5 (1)
[7]   Ratio statistics of gene expression levels and applications to microarray data analysis [J].
Chen, YD ;
Kamat, V ;
Dougherty, ER ;
Bittner, ML ;
Meltzer, PS ;
Trent, JM .
BIOINFORMATICS, 2002, 18 (09) :1207-1215
[8]   Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression [J].
Chou, CC ;
Chen, CH ;
Lee, TT ;
Peck, K .
NUCLEIC ACIDS RESEARCH, 2004, 32 (12) :e99
[9]   DNA-BAR: distinguisher selection for DNA barcoding [J].
DasGupa, B ;
Konwar, KM ;
Mandoiu, II ;
Shvartsman, AA .
BIOINFORMATICS, 2005, 21 (16) :3424-3426
[10]   Estimation of transformation parameters for microarray data [J].
Durbin, B ;
Rocke, DM .
BIOINFORMATICS, 2003, 19 (11) :1360-1367