LLM Acceleration on FPGAs: A Comparative Study of Layer and Spatial Accelerators

Luis D. Prieto-Sibaja, Luis G. Leon-Vega, Indra Leon-Vega, Jorge Castro-Godinez, Stefano Cozzini

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Large Language Models (LLMs) have emerged as the leading technique in Natural Language Processing due to their remarkable capabilities. These models are exceptionally complex, often utilising billions of parameters and consuming tens of gigabytes of memory unless optimised. Typically, LLMs are trained and accelerated using general-purpose Graphics Processing Units (GPUs), with a single inference potentially occupying an entire GPU. Consequently, cloud services may require hundreds of GPUs to deliver high-quality AI assistant services. This demand for extensive hardware acceleration allows exploring alternative architectures that offer improved execution time and energy efficiency.This study evaluates Field-Programmable Gate Arrays (FP-GAs), renowned for their flexibility and efficiency, to accelerate LLMs. We specifically compare the resource consumption and latency of two distinct architectures: layer and spatial accelerators. Analysing the Llama 2-7B model, we identified potential optimisation opportunities within its composition and operational graphs. Our most successful implementations demonstrate a performance improvement ranging from 1.37× to 10.98× over two AMD EPYC processors with 64 cores each. Moreover, our results indicate that, with further refinement, FPGAs have the potential to surpass GPU performance for LLM inference, showcasing their feasibility as a viable alternative for this demanding application.

Idioma originalInglés
Título de la publicación alojada2024 IEEE 42nd Central America and Panama Convention, CONCAPAN 2024
EditorialInstitute of Electrical and Electronics Engineers Inc.
Edición2024
ISBN (versión digital)9798350366723
DOI
EstadoPublicada - 2024
Evento42nd IEEE Central America and Panama Convention, CONCAPAN 2024 - San Jose, Costa Rica
Duración: 27 nov 202429 nov 2024

Conferencia

Conferencia42nd IEEE Central America and Panama Convention, CONCAPAN 2024
País/TerritorioCosta Rica
CiudadSan Jose
Período27/11/2429/11/24

Huella

Profundice en los temas de investigación de 'LLM Acceleration on FPGAs: A Comparative Study of Layer and Spatial Accelerators'. En conjunto forman una huella única.

Citar esto