Skip to main navigation Skip to search Skip to main content

Using GPT-3 as a Text Data Augmentator for a Complex Text Detector

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

In this work, we explore the problem of complex text detection. This problem is a frequent challenge when implementing text simplification pipelines. Identifying complex text segments can trigger text simplification models, making a better resource usage as state of the art Large Language Models are expensive to use. We focus in Spanish, as it is an under-represented language, given the lack of simple/complex paired datasets. We use a novel paired dataset in Spanish of financial educational texts to train and test our methods. To improve the performance of the classifier, we propose the usage of text simplifications generated with GPT-3 (data augmenter) to alleviate the need to label a large number of text segments as simple or complex. We use the BERT pre-trained model on Spanish data known as Spanish BERT (BETO) and explore the effect of augmenting target data in the model performance.

Original languageEnglish
Title of host publication5th IEEE International Conference on BioInspired Processing, BIP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350330052
DOIs
StatePublished - 2023
Event5th IEEE International Conference on BioInspired Processing, BIP 2023 - San Carlos, Alajuela, Costa Rica
Duration: 28 Nov 202330 Nov 2023

Publication series

Name5th IEEE International Conference on BioInspired Processing, BIP 2023

Conference

Conference5th IEEE International Conference on BioInspired Processing, BIP 2023
Country/TerritoryCosta Rica
CitySan Carlos, Alajuela
Period28/11/2330/11/23

Keywords

  • GPT-3
  • Text Complexity Detection
  • Text Simplification
  • Transformer

Fingerprint

Dive into the research topics of 'Using GPT-3 as a Text Data Augmentator for a Complex Text Detector'. Together they form a unique fingerprint.

Cite this