XML-based Knowledge Discovery for Linguistic Atlas of Sicily (ALS) Project
- Autori: Pirrone, R.; Gentile, A.; Cannella, V.; Russo, G.
- Anno di pubblicazione: 2009
- Tipologia: roceedings (TIPOLOGIA NON ATTIVA)
The identification of new useful patterns in data is a core process for intelligent systems. Information overflow is directly related to this problem. In this work we propose a knowledge discovery methodology to retrieve useful and novel information from raw data stored in a DBMS. We used ALSDB, a database that has been built suitably to access structured information obtained from the questionnaires produced in the Linguistic Atlas of Sicily (ALS) project. The ALS project is a decennal joint effort led by researchers at the Dipartimento di Scienze Filologiche e Linguistiche of the University of Palermo that has the purpose to track and study the geo-linguistic and lexicographic processes about the function and usage of the Sicilian dialect. The main goal of the work described in this paper is to develop an information retrieval methodology that incorporates the directions of linguistic investigation embedded into the ALS questionnaire into a querying tool abstratcing away from the intricacies of SQL or XML query constructs. We do this setting up a methodology and data retrieval tool that is scalable and gen- eral enough to allow, firstly, evaluation of linguistics’ hypotheses about regional language and dialect evolution in space and time, and, secondly, to help discover new directions of investigation. This works presents the process of knowledge discovery. Starting from conceptualization of few basic ideas, concepts have been extracted from the DBMS through an XML-based mapping and used as building blocks for further investigations. The interaction with users is very intuitive, and the results are incrementally and automatically proposed to the researchers, who may determine to use them as new knowledge to maintain for further use or discard them.