Salta al contenuto principale
Passa alla visualizzazione normale.

ALESSANDRO ALBANO

Cross Lingual Embeddings for Clinical Text: A Statistical Framework for Validating Real and Synthetic Electronic Health Records

  • Autori: Speciale Marco; Albano Alessandro; Sciandra Mariangela; Plaia Antonella
  • Anno di pubblicazione: 2025
  • Tipologia: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/684744

Abstract

The effective integration of real and synthetic clinical data in multiple languages is essential to advance healthcare research. In this study, we propose a statistical framework that leverages cross-lingual embeddings to validate semantic alignment between authentic Italian EHRs and synthetic English clinical notes. Using two state-of-the-art models, E5 and BGE, we encode the texts and employ Fuzzy C-Means clustering along with multidimensional scaling to assess their semantic coherence. Our analysis reveals distinct language-specific patterns alongside robust cross-lingual alignment, highlighting the promise of synthetic data augmentation in mitigating resource scarcity.