Skip to main navigation Skip to search Skip to main content

Análisis de Resiliencia sobre Estrategias para Optimización Energética de Modelos de Aprendizaje Profundo en Inteligencia Artificial

  • Meneses, Esteban (Institutional academic coordinator)
  • Ciorba, Florina (External collaborating researcher )
  • Asch, Christian (External collaborating researcher )
  • Rojas, Elvis (External collaborating researcher )
  • Navarro Todd, Luis Carlos (External collaborating researcher )
  • University of Basel
  • Centro Nacional de Alta Tecnología
  • Universidad Nacional de Costa Rica
  • Escuela de Ingeniería en Computación

Project: Research Projects With national external fundsBasic and applied research

Project Details

Description

Modern artificial intelligence (AI) has a transformative yet problematic impact. While AI models have advanced enormously, from chatbots to precision agriculture tools, their massive development and use pose significant challenges in terms of sustainability and reliability. A critical issue is their high energy consumption. Training a large language model (LLM) like GPT-3 can require an amount of electricity comparable to the annual consumption of over 100 homes, due to the need for thousands of processors (GPUs) running for weeks. Although each individual query (inference) consumes less energy, the global impact is considerable and growing, contributing to carbon emissions and straining power grids. This contradicts the sustainability principles established by organizations like UNESCO and the OECD. The literature reports various strategies for energy-optimizing AI models. In addition to energy consumption, the problem of computational system reliability must be addressed. The supercomputers used to train these models experience constant failures, often every few hours. Despite sophisticated fault-tolerance mechanisms, risks such as "silent data corruption" (SDC) persist, where a single bit flip can lead to a catastrophic result. This research project proposal seeks to understand the relationship between energy efficiency and fault tolerance in AI. The goal is to determine how energy optimization strategies behave in the presence of failures, and thus contribute to the development of more sustainable and robust AI systems. The main result of this proposal is the publication of scientific findings and the creation of an experimental platform for future research. Keywords: artificial intelligence, deep learning, energy efficiency, resilience

General Objective

Evaluar la resiliencia de los principales mecanismos de optimización energética en modelos de aprendizaje profundo

Research Lines

Deep Learning
Short titleAprendizaje profundo
StatusActive
Effective start/end date1/01/2631/12/27

Keywords

  • deep learning

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.