TY - JOUR
T1 - A Novel Spanish Dataset for Financial Education Text Simplification Targeting Visually Impaired Individuals
AU - Perez-Rojas, Nelson
AU - Calderon-Ramirez, Saul
AU - Solis-Salazar, Martin
AU - Romero-Sandoval, Mario
AU - Arias-Monge, Monica
AU - Saggion, Horacio
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail.
AB - Automatic Text Simplification (ATS) is a crucial task in natural language processing, aimed at making texts more comprehensible, particularly for specific groups such as individuals with visual impairments. One of the primary challenges in developing models for ATS is the scarcity of data, especially in Spanish. This manuscript introduces a novel dataset tailored for Spanish speakers with visual impairments, consisting of 5,314 pairs of original and simplified sentences created using established simplification rules. Additionally, we evaluate the feasibility of augmenting this dataset using large language models such as Generative Pre-training Transformer (GPT)-3, TUNER, and Multilingual T5 (mT5). We compare the simplifications generated by these models with our dataset to assess their effectiveness in data augmentation. The characteristics of our dataset and the findings from these comparisons are discussed in detail.
KW - Automatic Text Simplification
KW - Lexical Complexity Prediction
KW - Lexical Simplification
KW - Word Complexity
UR - http://www.scopus.com/inward/record.url?scp=105005265296&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3568693
DO - 10.1109/ACCESS.2025.3568693
M3 - Artículo
SN - 2169-3536
JO - IEEE Access
JF - IEEE Access
ER -