Application of techniques to extract knowledge from data and aid in decision making.

Abstract

In recent decades, the volume of information that is computerized in the databases of most organizations and companies has grown dramatically. Much of this information is historical and can be used to explain the past, understand the present and predict future information. In general, decisions made in these institutions are also based on past experience. Hence the interest in analyzing the information. However, the characteristics of this information (dispersed, with different formats, large volume) make it unfeasible or very difficult to use manual analysis methods. Data Mining techniques and tools come to solve this problem by being able to support the extraction of useful knowledge automatically. More and more companies are using these techniques with the main objective of extracting knowledge to help them in their decisions or to apply the models obtained to different data sets. For this knowledge to be useful, in many cases it is necessary to evaluate not only the data mining model but also the context where it has to be applied: presence of costs associated to model errors, changes in the environment… Data Mining tools currently on the market include very basic and simple methods of model evaluation, which do not take into account the application context. More recent techniques not well known in the business domain, such as ROC analysis, model combination or calibration allow the evaluation and/or adaptation/revision of models to improve the results of their application. The range of sectors in which the group has worked includes IT and consulting companies, but also companies ranging from distribution to hospital management.

Scientific officer

Hernández Orallo, José

Stakeholders

Applications

  • Generation of predictive and descriptive data mining models.
  • Evaluation of cost-based data mining models.
  • Adaptation/contextualization of models for application.

Technical advantages

  • Better results in the application of models, reducing costs and improving accuracy.
  • Experience in the use of commercial data mining tools (e.g. SPSS Clementine) but also freely available, thus reducing costs.
  • Use of sophisticated and adaptable tools, such as Weka or R.

Benefits it provides

  • Optimization in the use of Data Mining models.
  • Inclusion of costs in the evaluation of models.

Relevant experience

The ELP group, created in 1989, is identified in the registry of consolidated research groups of the Generalitat Valenciana since October 2000 (key GR-00143). The group’s activity has been mainly related to multi-paradigm programming languages and rigorous methods for software development, focusing on rule-based programming and the use of abstract interpretation techniques and transformation techniques for the optimization of program execution. Rule-based languages have also been the basis for inductive programming and for the representation of complex but understandable models resulting from the extraction of knowledge from data (data mining). The ELP group has participated in more than 30 competitive projects financed with European, national and community funds. Its research activity has often been developed in connection with related groups based in foreign universities, including Germany (RWTH Aachen, U. of Kiel), Australia (Monash U.), Austria (Technische Universitat Wien), United States (U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Washington, Stanford), France (- U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Stanford), France (- U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Stanford, Washington, Stanford), Washington, Stanford), France (-‘Ecole Polytechnique, U. Grenoble, U. Nice, U. de Paris Sud), Italy (U. di Pisa, U. di Siena, U. di Udine) and United Kingdom (U. Bristol). The group has participated in several projects with companies where the group’s knowledge has been transferred or specific technology has been developed. // The group ELP, created in 1989, was recognized as a consolidated group of the Valencian Government in October 2000 (reference GR-00143). The group’s activities have mainly focused on multi-paradigm programming languages and rigurous methods for software development, with particular focus on rule–based programming, and the use of abstract interpretation and program transformation techniques for the optimization of program execution. Rule-based languages have been also used for inductive programming and complex model representation that are also comprehensible as a result of knowledge discovering (data mining). The ELP group has participated in more than 30 competitive research projects funded by the EU, the Spanish Research Funding Agency, and other European foundations. The group keeps a good record of international collaborations. Including Germany (RWTH Aachen, U. Kiel), Australia (Monash U.), Austria (Technische Universit-“at Wien), USA (U. of Illinois at Urbana-Champaign, National Research Laboratory, Portland State U., Washington, Stanford), France (-‘Ecole Polytechnique, U. Grenoble, U. Nice, U. Paris Sud), Italy (U. di Pisa, U. di Siena, U. di Udine) and UK (U. Bristol). The Group also keeps a good record of collaboration with industry, including IT companies as well as hospital management and distribution companies.