Abstract
Large Language Models (LLMs) have emerged as the leading technique in Natural Language Processing due to their remarkable capabilities. These models are exceptionally complex, often utilising billions of parameters and consuming tens of gigabytes of memory unless optimised. Typically, LLMs are trained and accelerated using general-purpose Graphics Processing Units (GPUs), with a single inference potentially occupying an entire GPU. Consequently, cloud services may require hundreds of GPUs to deliver high-quality AI assistant services. This demand for extensive hardware acceleration allows exploring alternative architectures that offer improved execution time and energy efficiency.This study evaluates Field-Programmable Gate Arrays (FP-GAs), renowned for their flexibility and efficiency, to accelerate LLMs. We specifically compare the resource consumption and latency of two distinct architectures: layer and spatial accelerators. Analysing the Llama 2-7B model, we identified potential optimisation opportunities within its composition and operational graphs. Our most successful implementations demonstrate a performance improvement ranging from 1.37× to 10.98× over two AMD EPYC processors with 64 cores each. Moreover, our results indicate that, with further refinement, FPGAs have the potential to surpass GPU performance for LLM inference, showcasing their feasibility as a viable alternative for this demanding application.
| Original language | English |
|---|---|
| Title of host publication | 2024 IEEE 42nd Central America and Panama Convention, CONCAPAN 2024 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Edition | 2024 |
| ISBN (Electronic) | 9798350366723 |
| DOIs | |
| State | Published - 2024 |
| Event | 42nd IEEE Central America and Panama Convention, CONCAPAN 2024 - San Jose, Costa Rica Duration: 27 Nov 2024 → 29 Nov 2024 |
Conference
| Conference | 42nd IEEE Central America and Panama Convention, CONCAPAN 2024 |
|---|---|
| Country/Territory | Costa Rica |
| City | San Jose |
| Period | 27/11/24 → 29/11/24 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Cloud Computing
- Edge Computing
- Field Programmable Gate Arrays
- Hardware Acceleration
- High Performance Computing
- Large Language Models
Fingerprint
Dive into the research topics of 'LLM Acceleration on FPGAs: A Comparative Study of Layer and Spatial Accelerators'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver