Salta al contenuto principale
Passa alla visualizzazione normale.

IRENE SIRAGUSA

UniQA: an Italian and English Question-Answering Data Set Based on Educational Documents

  • Autori: Irene Siragusa; Roberto Pirrone
  • Anno di pubblicazione: 2024
  • Tipologia: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/678364

Abstract

In this paper we introduce UniQA, a high-quality Question-Answering data set that comprehends more than 1k documents and nearly 14k QA pairs. UniQA has been generated in a semi-automated manner using the data retrieved from the website of the University of Palermo, covering information about the bachelor and master degree courses for the academic year 2024/2025. Data are both in Italian and English, thus making the data set suitable for QA and translation models. To assess the data, we propose a Retrieval Augmented Generation model based on Llama-3.1-instruct. UniQA can be found at https://github.com/CHILab1/UniQA.