Skip to main navigation Skip to search Skip to main content

Reference-Based Metric Analysis for Evaluating Spanish Text Simplification

  • National University of Costa Rica
  • Costa Rica Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reliable evaluation remains a bottleneck for Spanish text simplification. We examine how five reference-based automatic metrics Bilingual Evaluation Understudy (BLEU), RecallOriented Understudy for Gisting Evaluation (ROUGE), Metric for Evaluation of Translation with Explicit Ordering (METEOR), System output Against References and the Input sentence (SARI), and Bidirectional Encoder Representations from Transformers Score (BERTScore)) respond to diverse, linguistically valid edits in human simplifications. Using the FEINA-test corpus (financial education texts simplified for visually impaired readers, with four human simplifications per segment and attribute annotations), we conduct a two-stage analysis. First, we compute each metric per segment: for BLEU, ROUGE, METEOR, and BERTScore, each simplification is compared to the complex source and the scores are averaged across annotators; for SARI, we rotate the four simplifications as hypotheses and use the remaining ones as references. Second, we introduce the Attribute Diversity Index (ADI), defined as the number of distinct linguistic attributes modified in the references for each segment, and assess metric sensitivity via Pearson correlation with ADI. All metrics show a negative association with edit diversity; BERTScore and ROUGE are the most sensitive, while SARI is comparatively more tolerant. METEOR and BERTScore yield higher mean scores overall, yet they also decline as diversity increases. These findings provide empirical evidence that commonly used reference-based metrics can penalize valid transformations in Spanish, particularly when multiple edit types are present, and suggest the potential for combining overlap-based metrics with SARI in accessibilityfocused evaluations.

Original languageEnglish
Title of host publication2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331570149
DOIs
StatePublished - 2025
Event7th IEEE International Conference on BioInspired Processing, BIP 2025 - Perez Zeledon, Costa Rica
Duration: 3 Dec 20255 Dec 2025

Publication series

Name2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025

Conference

Conference7th IEEE International Conference on BioInspired Processing, BIP 2025
Country/TerritoryCosta Rica
CityPerez Zeledon
Period3/12/255/12/25

Keywords

  • automatic evaluation
  • BERTScore
  • BLEU
  • METEOR
  • reference-based metrics
  • ROUGE
  • SARI
  • text simplification

Fingerprint

Dive into the research topics of 'Reference-Based Metric Analysis for Evaluating Spanish Text Simplification'. Together they form a unique fingerprint.

Cite this