Analysis of heterogeneous information networks for knowledge discovery in life-sciences

Project coordinator: PhD Nada Lavrač, IJS

Coordinator for NIB: PhD Kristina Gruden

Code: J7-7303

Duration: 1. 1. 2016 - 31. 12. 2018

The authors acknowledge the project J7-7303 was financially supported by the Slovenian Research Agency.


The proposal addresses knowledge discovery in complex data mining scenarios in life-sciences. With the development of high-throughput molecular biology techniques the data generated are getting into the range of so-called Big Data. Information relevant to a certain biological question is scattered in different public resources in heterogeneous formats and in the form inaccessible to typical biologists. To circumvent this situation, we need to fuse this information into a unique data source to be mined. The aim of the proposed project is to develop, implement, evaluate and apply a new methodology for analyzing large heterogeneous data in the area of life-sciences. The development of the proposed methodology is motivated by a tremendous increase in data generation within life-sciences research, while the means for explanatory knowledge discovery from these large heterogeneous data sources is still lagging behind. We aim to improve the existing data analysis approaches by extending and combining text mining, relational data mining and information fusion methods. In order to evaluate the proposed methodology we will use several benchmark and real-world problems in the area of life-sciences, aiming to advance translational research in agriculture by extracting novel knowledge on plant immune signaling.

The project has the following objectives:

1. Development of a new methodology, which will enable fusing texts and complex relational background knowledge into the form of a large heterogeneous information network. This will be achieved by extending our own methodology for mining heterogeneous information networks through contextualizing the information on data instances in terms of available semantic background knowledge (domain taxonomies and ontologies), and by adapting the methodology to big data and complex life-science scenarios. 

2. Implementation of the methodology in the ClowdFlows or TextFlows and experimental evaluation of the proposed methodology on publicly available benchmark data sets, including selected medical problems for which large public heterogeneous data sets exist.

3. Application of the methodology to three life-science application scenarios: (i) cross-domain knowledge discovery from documents from two unrelated life-science problems, aiming to uncover yet unknown relations between "redox status" and "plant immune signaling", (ii) mining a time stamped stream of heterogeneous experimental data in the domain of plant immune signaling, and (iii) identification of key components in plant immune signaling determining the outcome of a disease. 

The project will contribute to the development of new algorithms for mining large heterogeneous data. Accessibility of the developed methodology will be ensured by implementing the methodology in one of our web data mining platforms ClowdFlows or TextFlows, which will enable the use of the developed technology to the broader research audience and increase its relevance also for life science experts. The research will be performed in close collaboration of data mining experts from JSI with domain experts from NIB.

Researchers - link to database SICRIS

Information about the project - link to database SICRIS