Resumen
Large Language Models (LLMs) have emerged as the leading technique in Natural Language Processing due to their remarkable capabilities. These models are exceptionally complex, often utilising billions of parameters and consuming tens of gigabytes of memory unless optimised. Typically, LLMs are trained and accelerated using general-purpose Graphics Processing Units (GPUs), with a single inference potentially occupying an entire GPU. Consequently, cloud services may require hundreds of GPUs to deliver high-quality AI assistant services. This demand for extensive hardware acceleration allows exploring alternative architectures that offer improved execution time and energy efficiency.This study evaluates Field-Programmable Gate Arrays (FP-GAs), renowned for their flexibility and efficiency, to accelerate LLMs. We specifically compare the resource consumption and latency of two distinct architectures: layer and spatial accelerators. Analysing the Llama 2-7B model, we identified potential optimisation opportunities within its composition and operational graphs. Our most successful implementations demonstrate a performance improvement ranging from 1.37× to 10.98× over two AMD EPYC processors with 64 cores each. Moreover, our results indicate that, with further refinement, FPGAs have the potential to surpass GPU performance for LLM inference, showcasing their feasibility as a viable alternative for this demanding application.
Idioma original | Inglés |
---|---|
Título de la publicación alojada | 2024 IEEE 42nd Central America and Panama Convention, CONCAPAN 2024 |
Editorial | Institute of Electrical and Electronics Engineers Inc. |
Edición | 2024 |
ISBN (versión digital) | 9798350366723 |
DOI | |
Estado | Publicada - 2024 |
Evento | 42nd IEEE Central America and Panama Convention, CONCAPAN 2024 - San Jose, Costa Rica Duración: 27 nov 2024 → 29 nov 2024 |
Conferencia
Conferencia | 42nd IEEE Central America and Panama Convention, CONCAPAN 2024 |
---|---|
País/Territorio | Costa Rica |
Ciudad | San Jose |
Período | 27/11/24 → 29/11/24 |