Skip to main navigation Skip to search Skip to main content

A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals

  • Nelson Perez-Rojas
  • , Saul Calderon-Ramirez
  • , Martin Solis-Salazar
  • , Mario Romero-Sandoval
  • , Monica Arias-Monge
  • , Horacio Saggion

Research output: Contribution to journalArticlepeer-review

Abstract

Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail.

Original languageEnglish
JournalIEEE Access
DOIs
StatePublished - 2025

Keywords

  • Automatic Text Simplification
  • Lexical Complexity Prediction
  • Lexical Simplification
  • Word Complexity

Fingerprint

Dive into the research topics of 'A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals'. Together they form a unique fingerprint.

Cite this