Skip to main content
Passa alla visualizzazione normale.

ALESSANDRO ALBANO

Cross Lingual Embeddings for Clinical Text: A Statistical Framework for Validating Real and Synthetic Electronic Health Records

  • Authors: Speciale Marco; Albano Alessandro; Sciandra Mariangela; Plaia Antonella
  • Publication year: 2025
  • Type: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/684744

Abstract

The effective integration of real and synthetic clinical data in multiple languages is essential to advance healthcare research. In this study, we propose a statistical framework that leverages cross-lingual embeddings to validate semantic alignment between authentic Italian EHRs and synthetic English clinical notes. Using two state-of-the-art models, E5 and BGE, we encode the texts and employ Fuzzy C-Means clustering along with multidimensional scaling to assess their semantic coherence. Our analysis reveals distinct language-specific patterns alongside robust cross-lingual alignment, highlighting the promise of synthetic data augmentation in mitigating resource scarcity.