Ontology-guided data preparation for discovering genotype-phenotype relationships

被引:16
作者
Coulet, Adrien [1 ,2 ]
Smail-Tabbone, Malika [2 ]
Benlian, Pascale [3 ]
Napoli, Amedeo [2 ]
Devignes, Marie-Dominique [2 ]
机构
[1] KIKA Med, F-75012 Paris, France
[2] LORIA, CNRS, INPL INRIA, Nancy UHP 2,UMR 7503, F-54506 Vandoeuvre Les Nancy, France
[3] Univ Paris 06, INSERM, UMRS Biochim Biol Mol 538, F-75571 Paris, France
关键词
D O I
10.1186/1471-2105-9-S4-S3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Complexity and amount of post-genomic data constitute two major factors limiting the application of Knowledge Discovery in Databases (KDD) methods in life sciences. Bio-ontologies may nowadays play key roles in knowledge discovery in life science providing semantics to data and to extracted units, by taking advantage of the progress of Semantic Web technologies concerning the understanding and availability of tools for knowledge representation, extraction, and reasoning. Results: This paper presents a method that exploits bio-ontologies for guiding data selection within the preparation step of the KDD process. We propose three scenarios in which domain knowledge and ontology elements such as subsumption, properties, class descriptions, are taken into account for data selection, before the data mining step. Each of these scenarios is illustrated within a case-study relative to the search of genotype-phenotype relationships in a familial hypercholesterolemia dataset. The guiding of data selection based on domain knowledge is analysed and shows a direct influence on the volume and significance of the data mining results. Conclusions: The method proposed in this paper is an efficient alternative to numerical methods for data selection based on domain knowledge. In turn, the results of this study may be reused in ontology modelling and data integration.
引用
收藏
页数:9
相关论文
共 46 条
[1]  
Agarwal S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P506
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Anand S. S., 1995, Proceedings of the 1995 ACM CIKM International Conference on Information and Knowledge Management, P37, DOI 10.1145/221270.221321
[4]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[5]  
[Anonymous], BIOPORTAL
[6]   Haploview: analysis and visualization of LD and haplotype maps [J].
Barrett, JC ;
Fry, B ;
Maller, J ;
Daly, MJ .
BIOINFORMATICS, 2005, 21 (02) :263-265
[7]   Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification [J].
Bernstein, A ;
Provost, F ;
Hill, S .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (04) :503-518
[8]   Bio-ontologies: current trends and future directions [J].
Bodenreider, Olivier ;
Stevens, Robert .
BRIEFINGS IN BIOINFORMATICS, 2006, 7 (03) :256-274
[9]   Predicting protein stability changes from sequences using support vector machines [J].
Capriotti, E ;
Fariselli, P ;
Calabrese, R ;
Casadio, R .
BIOINFORMATICS, 2005, 21 :54-58
[10]  
Cespivova H., 2004, P ECML PKDD04 WORKSH