SALVATORE VITABILE

Explainable Machine-Learning Models for COVID-19 Prognosis Prediction Using Clinical, Laboratory and Radiomic Features

Autori: Prinzi, Francesco; Militello, Carmelo; Scichilone, Nicola; Gaglio, Salvatore; Vitabile, Salvatore
Anno di pubblicazione: 2023
Tipologia: Articolo in rivista
OA Link: http://hdl.handle.net/10447/617935

Abstract

The SARS-CoV-2 virus pandemic had devastating effects on various aspects of life: clinical cases, ranging from mild to severe, can lead to lung failure and to death. Due to the high incidence, data-driven models can support physicians in patient management. The explainability and interpretability of machine-learning models are mandatory in clinical scenarios. In this work, clinical, laboratory and radiomic features were used to train machine-learning models for COVID-19 prognosis prediction. Using Explainable AI algorithms, a multi-level explainable method was proposed taking into account the developer and the involved stakeholder (physician, and patient) perspectives. A total of 1023 radiomic features were extracted from 1589 Chest X-Ray images (CXR), combined with 38 clinical/laboratory features. After the pre-processing and selection phases, 40 CXR radiomic features and 23 clinical/laboratory features were used to train Support Vector Machine and Random Forest classifiers exploring three feature selection strategies. The combination of both radiomic, and clinical/laboratory features enabled higher performance in the resulting models. The intelligibility of the used features allowed us to validate the models' clinical findings. According to the medical literature, LDH, PaO2 and CRP were the most predictive laboratory features. Instead, ZoneEntropy and HighGrayLevelZoneEmphasis - indicative of the heterogeneity/uniformity of lung texture - were the most discriminating radiomic features. Our best predictive model, exploiting the Random Forest classifier and a signature composed of clinical, laboratory and radiomic features, achieved AUC=0.819, accuracy=0.733, specificity=0.705, and sensitivity=0.761 in the test set. The model, including a multi-level explainability, allows us to make strong clinical assumptions, confirmed by the literature insights.