Skip to main content
Passa alla visualizzazione normale.

ROBERTO PIRRONE

PARSAL: Pipeline for Automatic Retrieval and Structuring of Academic Literature

  • Authors: Contino, S.; Siragusa, I.; Sciortino, G.; Pirrone, R.
  • Publication year: 2026
  • Type: Contributo in atti di convegno pubblicato in volume
  • OA Link: http://hdl.handle.net/10447/703118

Abstract

In this work, we present PARSAL, a retrieval pipeline to obtain relevant scientific articles in a standardized format, given some relevant keyword. The pipeline exploits the API of scientific publishers to retrieve relevant full-text articles in PDF, JSON, or XML format. Afterwards, a parser was implemented to standardize the retrieved articles in a unique format, thus they can be inserted in a Mongo DB database and accessed via a custom GUI. In addition, papers are arranged in a Knowledge Graph, built via LLamaIndex framework, to allow users to make queries to the collected articles and obtain a verbose answer. The code of the developed pipeline, GUI and Knowledge Graph creation and inference is available on GitHub.