A data-mining approach to spacer oligonucleotide typing of Mycobacterium tuberculosis

被引:67
作者
Sebban, M
Mokrousov, I
Rastogi, N
Sola, C [1 ]
机构
[1] Inst Pasteur Guadeloupe, Unite TB & Mycobacteries, BP 484, F-97165 Pointe A Pitre, Guadeloupe, France
[2] French W Indies & Guiana Univ, TRIVIA, Dept Math & Comp Sci, F-97159 Pointe A Pitre, Guadeloupe, France
关键词
D O I
10.1093/bioinformatics/18.2.235
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The Direct Repeat (DR) locus of Mycobacterium tuberculosis is a suitable model to study (i) molecular epidemiology and (ii) the evolutionary genetics of tuberculosis. This is achieved by a DNA analysis technique (genotyping), called spacer oligonucleotide typing (spoligotyping). In this paper, we investigated data analysis methods to discover intelligible knowledge rules from spoligotyping, that has not yet been applied on such representation. This processing was achieved by applying the C4.5 induction algorithm and knowledge rules were produced. Finally, a Prototype Selection (PS) procedure was applied to eliminate noisy data. This both simplified decision rules, as well as the number of spacers to be tested to solve classification tasks. In the second part of this paper, the contribution of 25 new additional spacers and the knowledge rules inferred were studied from a machine learning point of view. From a statistical point of view, the correlations between spacers were analyzed and suggested that both negative and positive ones may be related to potential structural constraints within the DR locus that may shape its evolution directly or indirectly. Results: By generating knowledge rules induced from decision trees, it was shown that not only the expert knowledge may be modeled but also improved and simplified to solve automatic classification tasks on unknown patterns. A practical consequence of this study may be a simplification of the spoligotyping technique, resulting in a reduction of the experimental constraints and an increase in the number of samples processed.
引用
收藏
页码:235 / 243
页数:9
相关论文
共 31 条
  • [1] AHA D, 1971, MACH LEARN, V6, P37
  • [2] Characterization of Mycobacterium tuberculosis complex direct repeat sequence for use in cycling probe reaction
    Beggs, ML
    Cave, MD
    Marlowe, C
    Cloney, L
    Duck, P
    Eisenach, KD
    [J]. JOURNAL OF CLINICAL MICROBIOLOGY, 1996, 34 (12) : 2985 - 2989
  • [3] Identification of a contaminating Mycobacterium tuberculosis strain with a transposition of an IS6110 insertion element resulting in an altered spoligotype
    Benjamin, WH
    Lok, KH
    Harris, R
    Brook, N
    Bond, L
    Mulcahy, D
    Robinson, N
    Pruitt, V
    Kirkpatrick, DP
    Kimerling, ME
    Dunlap, NE
    [J]. JOURNAL OF CLINICAL MICROBIOLOGY, 2001, 39 (03) : 1092 - 1096
  • [4] Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
  • [5] Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
    Cole, ST
    Brosch, R
    Parkhill, J
    Garnier, T
    Churcher, C
    Harris, D
    Gordon, SV
    Eiglmeier, K
    Gas, S
    Barry, CE
    Tekaia, F
    Badcock, K
    Basham, D
    Brown, D
    Chillingworth, T
    Connor, R
    Davies, R
    Devlin, K
    Feltwell, T
    Gentles, S
    Hamlin, N
    Holroyd, S
    Hornby, T
    Jagels, K
    Krogh, A
    McLean, J
    Moule, S
    Murphy, L
    Oliver, K
    Osborne, J
    Quail, MA
    Rajandream, MA
    Rogers, J
    Rutter, S
    Seeger, K
    Skelton, J
    Squares, R
    Squares, S
    Sulston, JE
    Taylor, K
    Whitehead, S
    Barrell, BG
    [J]. NATURE, 1998, 393 (6685) : 537 - +
  • [6] Efron B., 1993, INTRO BOOTSTRAP, V1st ed., DOI DOI 10.1201/9780429246593
  • [7] IS6110-mediated deletions of wild-type chromosomes of Mycobacterium tuberculosis
    Fang, Z
    Doig, C
    Kenna, DT
    Smittipat, N
    Palittapongarnpim, P
    Watt, B
    Forbes, KJ
    [J]. JOURNAL OF BACTERIOLOGY, 1999, 181 (03) : 1014 - 1020
  • [8] Detection of a previously unamplified spacer within the DR locus of Mycobacterium tuberculosis:: Epidemiological implications
    Filliol, I
    Sola, C
    Rastogi, N
    [J]. JOURNAL OF CLINICAL MICROBIOLOGY, 2000, 38 (03) : 1231 - 1234
  • [9] Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats
    Frothingham, R
    Meeker-O'Connell, WA
    [J]. MICROBIOLOGY-UK, 1998, 144 : 1189 - 1196
  • [10] GATES GW, 1972, IEEE T INFORM THEORY, V18, P431, DOI 10.1109/TIT.1972.1054809