Optimization and interaction of Data Mining Models

Abstract

Data Mining is becoming a useful and popular tool for decision making. However, in some cases the process of obtaining and/or applying the models is particularly complicated. We refer to applications such as biomedicine or web mining in which an adequate representation of the data (using more expressive languages that capture the richness and complexity of the data) and/or of the model (expressible in a way understandable to the expert) can be crucial to obtain better results. Other times, complex problems involving several decisions that are not independent of each other have to be addressed. Data Mining techniques available on the market can only give approximate solutions in these situations. In the first case, by applying a previous transformation of the data, so that sometimes the models are not expressed in terms of the original attributes of the data, making the models less understandable; in the second case, by providing models with possibly optimal solutions for each problem but which turn out not to be optimal for the overall system. Thus, new Data Mining techniques and algorithms are necessary to provide an adequate and satisfactory solution to these problems. The algorithms we have developed allow the use of data representation languages and complex models, capable of working with all types of structured data (sets, lists, graphs, web documents or text…) and not exclusively with flat tables that include only categorical and numerical data. Also, a new technique based on simulation and agent theory allows us to connect different Data Mining models, imposing static and dynamic constraints to the whole system in order to determine the optimal solution to it.

Scientific officer

Ramírez Quintana, María José

Stakeholders

Applications

  • Data Mining projects from complex data.
  • Incorporation of constraints in Data Mining models.
  • Simulation of complex systems by interconnecting dependent Data Mining models.

Technical advantages

Possibility of applying data mining when data is complex (web documents, molecules, web graphs, sequences, sets, etc.). Combination of models and global optimization, allowing not only to obtain local forecasts but also to simulate the global behavior of a business area in the future. Comprehensibility of the models obtained while maintaining or improving the accuracy levels of standard techniques.

Benefits it provides

  • New areas of application of Data Mining techniques.
  • Possibility of tackling more complex problems than those usually answered by the classic data mining tools available in the market.

Relevant experience

The ELP group, created in 1989, is identified in the registry of consolidated research groups of the Generalitat Valenciana since October 2000 (key GR-00143). The group’s activity has been mainly related to multi-paradigm programming languages and rigorous methods for software development, focusing on rule-based programming and the use of abstract interpretation techniques and transformation techniques for the optimization of program execution. Rule-based languages have also been the basis for inductive programming and for the representation of complex but understandable models resulting from the extraction of knowledge from data (data mining). The ELP group has participated in more than 30 competitive projects financed with European, national and community funds. Its research activity has often been developed in connection with related groups based in foreign universities, including Germany (RWTH Aachen, U. of Kiel), Australia (Monash U.), Austria (Technische Universitat Wien), United States (U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Washington, Stanford), France (- U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Stanford University), France (- U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U, Washington, Stanford), France (-‘Ecole Polytechnique, U. Grenoble, U. Nice, U. de Paris Sud), Italy (U. di Pisa, U. di Siena, U. di Udine) and United Kingdom (U. Bristol). The group has participated in several projects with companies where the group’s knowledge has been transferred or specific technology has been developed. The range of sectors in which the group has worked includes, logically, IT and consulting companies, but also companies ranging from distribution to hospital management. // The group ELP, created in 1989, was recognized as a consolidated group of the Valencian Government in October 2000 (reference GR-00143). The group’s activities have mainly focused on multi-paradigm programming languages and rigurous methods for software development, with particular focus on rule–based programming, and the use of abstract interpretation and program transformation techniques for the optimization of program execution. Rule-based languages have been also used for inductive programming and complex model representation that are also comprehensible as a result of knowledge discovering (data mining). The ELP group has participated in more than 30 competitive research projects funded by the EU, the Spanish Research Funding Agency, and other European foundations. The group keeps a good record of international collaborations. Including Germany (RWTH Aachen, U. Kiel), Australia (Monash U.), Austria (Technische Universit-“at Wien), USA (U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Washington, Stanford), France (-‘Ecole Polytechnique, U. Grenoble, U. Nice, U. Paris Sud), Italy (U. di Pisa, U. di Siena, U. di Udine) and UK (U. Bristol). The Group also keeps a good record of collaboration with industry, including IT companies as well as hospital management and distribution companies.