Pubblicazione | SALVATORE VITABILE | Università degli Studi di Palermo

A text based indexing system for mammographic image retrieval and classification

Authors: Farruggia, A.; Magro, R.; Vitabile, S.
Publication year: 2014
Type: Articolo in rivista (Articolo in rivista)
Key words: Information retrieval; Medical documents indexing and classification; Medical images indexing and classification; Hardware and Architecture; Software; Computer Networks and Communications
OA Link: http://hdl.handle.net/10447/127244

Abstract

In modern medical systems huge amount of text, words, images and videos are produced and stored in ad hoc databases. Medical community needs to extract precise information from that large amount of data. Currently ICT approaches do not provide a methodology for content-based medical images retrieval and classification. On the other hand, from the Internet of Things (IoT) perspective, the ICT medical data can be produced by several devices. Produced data complies with all Big Data features and constraints. The IoT guidelines put at the center of the system a new smart software to manage and transform Big Data in a new understanding form. This paper describes a text based indexing system for mammographic images retrieval and classification. The system deals with text (structured reports) and images (mammograms) mining and classification in a typical Department of Radiology. DICOM structured reports, containing free text for medical diagnosis, have been analyzed and labeled in order to classify the corresponding mammographic images. Information Retrieval process is based on some text manipulation techniques, such as light semantic analysis, stop-word removing, and light medical natural language processing. The system includes also a Search Engine module, based on a Bayes Naive Classifier. The experimental results provide interesting performance in terms of Specificity and Sensibility. Two more indexes have been computed in order to assess the system robustness: the Az (Area under ROC Curve) index and the σAz (Az standard error) index. The dataset is composed of healthy and pathological DICOM structured reports. Two use case scenarios are presented and described to prove the effectiveness of the proposed approach. © 2014 Elsevier B.V. All rights reserved.