Skip to main navigation Skip to search Skip to main content

LLM Acceleration on FPGAs: A Comparative Study of Layer and Spatial Accelerators

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) have emerged as the leading technique in Natural Language Processing due to their remarkable capabilities. These models are exceptionally complex, often utilising billions of parameters and consuming tens of gigabytes of memory unless optimised. Typically, LLMs are trained and accelerated using general-purpose Graphics Processing Units (GPUs), with a single inference potentially occupying an entire GPU. Consequently, cloud services may require hundreds of GPUs to deliver high-quality AI assistant services. This demand for extensive hardware acceleration allows exploring alternative architectures that offer improved execution time and energy efficiency.This study evaluates Field-Programmable Gate Arrays (FP-GAs), renowned for their flexibility and efficiency, to accelerate LLMs. We specifically compare the resource consumption and latency of two distinct architectures: layer and spatial accelerators. Analysing the Llama 2-7B model, we identified potential optimisation opportunities within its composition and operational graphs. Our most successful implementations demonstrate a performance improvement ranging from 1.37× to 10.98× over two AMD EPYC processors with 64 cores each. Moreover, our results indicate that, with further refinement, FPGAs have the potential to surpass GPU performance for LLM inference, showcasing their feasibility as a viable alternative for this demanding application.

Original languageEnglish
Title of host publication2024 IEEE 42nd Central America and Panama Convention, CONCAPAN 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Edition2024
ISBN (Electronic)9798350366723
DOIs
StatePublished - 2024
Event42nd IEEE Central America and Panama Convention, CONCAPAN 2024 - San Jose, Costa Rica
Duration: 27 Nov 202429 Nov 2024

Conference

Conference42nd IEEE Central America and Panama Convention, CONCAPAN 2024
Country/TerritoryCosta Rica
CitySan Jose
Period27/11/2429/11/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Cloud Computing
  • Edge Computing
  • Field Programmable Gate Arrays
  • Hardware Acceleration
  • High Performance Computing
  • Large Language Models

Fingerprint

Dive into the research topics of 'LLM Acceleration on FPGAs: A Comparative Study of Layer and Spatial Accelerators'. Together they form a unique fingerprint.

Cite this