Salta al contenuto principale
Passa alla visualizzazione normale.

DOMENICO TEGOLO

Transfer Learning Approach with Features Block Selection via Genetic Algorithm for High-Imbalance and Multi-Label Classification of HPA Confocal Microscopy Images

Abstract

Advances in deep learning are impressive in various fields and have achieved performance beyond human capabilities in tasks such as image classification, as demonstrated in competitions such as the ImageNet Large Scale Visual Recognition Challenge. Nonetheless, complex applications like medical imaging continue to present significant challenges; a prime example is the Human Protein Atlas (HPA) dataset, which is computationally challenging and complex due to the high-class imbalance with the presence of rare patterns and the need for multi-label classification. It includes 28 distinct patterns and more than 500 unique label combinations, with protein localization that can appear in different cellular regions such as the nucleus, the cytoplasm, and the nuclear membrane. Moreover, the dataset provides four distinct channels for each sample, adding to its complexity, with green representing the target protein, red indicating microtubules, blue showing the nucleus, and yellow depicting the endoplasmic reticulum. We propose a two-phase transfer learning approach based on feature-block extraction from twelve ImageNet-pretrained CNNs. In the first phase, we address single-label multiclass classification using CNNs as feature extractors combined with SVM classifiers on a subset of the HPA dataset. We demonstrate that the simple concatenation of feature blocks extracted from different CNNs improves performance. Furthermore, we apply a genetic algorithm to select the sub-optimal combination of feature blocks. In the second phase, based on the results of the previous stage, we apply two simple multi-label classification strategies and compare their performance with four classifiers. Our method integrates image-level and cell-level analysis. At the image level, we assess the discriminative contribution of individual and combined channels, showing that the green channel is the strongest individually but benefits from combinations with red and yellow. At the cellular level, we extract features from the nucleus and nuclear-membrane ring, an analysis not previously explored in the HPA literature, which proves effective for recognizing rare patterns. Combining these perspectives enhances the detection of rare classes, achieving an F1 score of 0.8 for “Rods & Rings”, outperforming existing approaches. Accurate identification of rare patterns is essential for biological and clinical applications, underscoring the significance of our contribution.