Skip to main navigation Skip to search Skip to main content

Optimización de modelos de lenguaje para la recuperación y generación de información sobre especies: Integración de conocimiento contextual por medio del estándar Plinian Core

Project: Research Projects Internally fundedTechnological Development

Project Details

Description

This project aims to develop and optimize language models for the accurate retrieval
and generation of species information, integrating contextual knowledge through the
Plinian Core Standard, advanced information extraction techniques, and the
Retrieval-Augmented Generation (RAG) approach. The project seeks to efficiently
respond to queries from both scientific users and the general public, ensuring the
precision and semantic richness of the generated content.
The specific objectives include integrating the Plinian Core Standard within the RAG
framework to improve the retrieval of semi-structured biodiversity data in Spanish,
ensuring contextual accuracy in the generated responses. Additionally, retrieval models
will be developed and evaluated using data labeled according to the Plinian Core,
improving the relevance and accuracy of the retrieved content. This process will be
complemented by information extraction techniques designed to identify and extract key
biodiversity entities, such as species names, habitats, and conservation statuses,
optimizing the models to address the specific challenges of information extraction.
The data to be used during the project’s execution were generated by the National
Institute of Biodiversity (INBio) from 1999 to 2015 and contain over 32,000 species
description texts of biological groups such as plants, arthropods, fungi, mammals,
reptiles, amphibians, among others.
Finally, generative models will be refined to produce high-quality text in Spanish,
aligned with biodiversity documentation standards, ensuring consistency with the
metadata extracted from the context. The system’s performance will be evaluated both
quantitatively and qualitatively, comparing the effectiveness of RAG enhanced with
Plinian Core against non-standardized approaches, highlighting the benefits of using
biodiversity data enriched with metadata to improve accuracy and relevance in species
information retrieval and generation.
The research question to be addressed is: How can an advanced question-and-answer
system that integrates contextual knowledge (utilizing the Plinian Core Standard and
information extraction techniques) with RAG improve the accuracy and relevance of
species information generated in Spanish compared to traditional RAG approaches

General Objective

Desarrollar y optimizar modelos grandes de lenguaje para la recuperación y generación
precisa de información sobre especies, integrando el Estándar Plinian Core, técnicas
avanzadas de extracción de información y Retrieval-Augmented Generation (RAG por
sus siglas en inglés), con el objetivo de responder a preguntas tanto de usuarios
científicos como del público en general.

Research Lines

ambiente
Short titleModelo Plinian
Acronym Plinian Core
StatusActive
Effective start/end date2/01/2530/12/27

Keywords

  • Plinian

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.