Highly accurate classification of Watson-Crick basepairs on termini of single DNA molecules

被引:74
作者
Winters-Hilt, S
Vercoutere, W
DeGuzman, VS
Deamer, D
Akeson, M
Haussler, D
机构
[1] Univ Calif Santa Cruz, Howard Hughes Med Inst, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Dept Chem & Biochem, Santa Cruz, CA 95064 USA
[3] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[4] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
关键词
D O I
10.1016/S0006-3495(03)74913-3
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
We introduce a computational method for classification of individual DNA molecules measured by an a-hemolysin channel detector. We show classification with better than 99% accuracy for DNA hairpin molecules that differ only in their terminal Watson-Crick basepairs. Signal classification was done in silico to establish performance metrics (i.e., where train and test data were of known type, via single-species data files). It was then performed in solution to assay real mixtures of DNA hairpins. Hidden Markov Models (HMMs) were used with Expectation/Maximization for denoising and for associating a feature vector with the ionic current blockade of the DNA molecule. Support Vector Machines (SVMs) were used as discriminators, and were the focus of off-line training. A multiclass SVM architecture was designed to place less discriminatory. load on weaker discriminators, and novel SVM kernels were used to boost discrimination strength. The tuning on HMMs and SVMs enabled biophysical analysis of the captured molecule states and state transitions; structure revealed in the biophysical analysis was used for better feature selection.
引用
收藏
页码:967 / 976
页数:10
相关论文
共 29 条
[1]   Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules [J].
Akeson, M ;
Branton, D ;
Kasianowicz, JJ ;
Brandin, E ;
Deamer, DW .
BIOPHYSICAL JOURNAL, 1999, 77 (06) :3227-3233
[2]  
[Anonymous], 1999, The Nature Statist. Learn. Theory
[3]  
[Anonymous], 1989, INTRO ALGORITHMS
[4]  
Bayley H, 2000, J GEN PHYSIOL, V116, p1A
[5]   Multicategory classification by support vector machines [J].
Bredensteiner, EJ ;
Bennett, KP .
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 1999, 12 (1-3) :53-79
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]  
Chung SH, 1998, METHOD ENZYMOL, V293, P420
[8]   CHARACTERIZATION OF SINGLE CHANNEL CURRENTS USING DIGITAL SIGNAL-PROCESSING TECHNIQUES BASED ON HIDDEN MARKOV-MODELS [J].
CHUNG, SH ;
MOORE, JB ;
XIA, L ;
PREMKUMAR, LS ;
GAGE, PW .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1990, 329 (1254) :265-285
[9]  
Colquhoun David, 1995, P483
[10]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X