Development and applications of new semantic data mining methods in life sciences

Project coordinator: dr. Nada Lavrač, IJS

Coordinator for NIB: dr. Kristina Gruden

Code: J2-5478

Duration: 1.8.2013 - 31.7.2016


Knowledge discovery in databases is the area of computer science aimed at automatic search and exploration of large volumes of data with the goal of finding new hypotheses in the form of models and patterns automatically induced from the data. The discovered models/patterns are especially interesting if they are unexpected or if they contribute to the confirmation of yet unproven hypotheses. The limitation of current publicly available data mining and knowledge discovery platforms is their capacity of dealing only with simple tabular data. However, motivated by the increasing volume of semi-structured, heterogeneous and distributed data, the objective of the proposed SemDM project is to address this challenge and enhance the currently available data mining platforms by the ability to make use of distributed, heterogeneous information and knowledge sources, required for data analysis in knowledge-intensive domains.  The project has the following objectives:  - To develop new algorithms for Semantic Data Mining (SemDM) which will enable knowledge discovery from data stored in heterogeneous (structured, semi-structured and unstructured) and distributed data and knowledge sources, including semantically annotated data stored in publicly available ontologies (Gene Ontology and other knowledge sources available in the Linked Open Data cloud). - To develop a novel, science-oriented data mining platform ClowdFlows which will upgrade our recently developed Orange4WS platform, to enable browser-based construction of innovative data mining workflows from local and distributed data processing and mining services. - To apply and validate the proposed service-oriented Semantic Data Mining approach to two case studies, one in breast cancer data analysis and another in the discovery of glioma patients subgroups to validate novel molecular markers.  In the glioma case study, JSI and NIB researchers will jointly try to find new discoveries concerning glioblastoma (GBM), the most common and most aggressive form of glioma cancer. Recently, several biomarkers have been proposed as prognostic and predictive factors with respect to the patient’s therapy responsis, but so far none of them was applied in therapeutics. There is a need to decipher the interactive relationships among contributing genes in the clinical arena to make fast and accurate diagnosis of tumor grade and predict the prognosis of a particular patient. We argue that this can be achieved by a systems biology approach based on discovering subgroups of GBM patients, most likely based on their cell of origin (stem cells) and infiltrating stromal (stem) cells, resulting in distinct patterns of tumor progression. The project application aims to take advantage of studying GBM cancer stem cells and stromal supporting cells to identify genes - biomarkers that are relevant for GBM prognosis and targeting. The project will contribute to the development of new Semantic Data Mining algorithms, the improvement of their public accessibility through the web-based ClowdFlows platform, and to the generation of new knowledge in medical and bioinformatics domains. The work on this project will be performed in close collaboration of data mining experts from Jožef Stefan Institute (JSI) with domain experts from National Institute of Biology (NIB).

More about the project