Skip to main navigation Skip to search Skip to main content

Performance Evaluation of Deep Learning Formats: A Comparative Study of ONNX and Pytorch for Inference Efficiency and Portability

  • Costa Rica Institute of Technology
  • National High Technology Center

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The high computational cost of Large Language Models (LLMs) requires post-training optimization for efficient deployment. This paper presents a rigorous empirical comparison of inference performance between the native PyTorch framework and the optimized ONNX format for contemporary LLMs, including Gemma-7B, Phi-3-Medium, and Llama-3.1-8B, on NVIDIA GPU hardware. Using low-level profiling with NVIDIA's Nsight Compute suite, we found that ONNX conversion yields a substantial and statistically significant performance advantage. Our results show a latency reduction of up to 27.6% and an increase in computational throughput (GFLOP/s) of up to 48.3%. We conclude that adopting an ONNX-based strategy is not merely a portability choice but a critical optimization step for deploying high-performance, efficient LLMs in production environments.

Original languageEnglish
Title of host publication2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331570149
DOIs
StatePublished - 2025
Event7th IEEE International Conference on BioInspired Processing, BIP 2025 - Perez Zeledon, Costa Rica
Duration: 3 Dec 20255 Dec 2025

Publication series

Name2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025

Conference

Conference7th IEEE International Conference on BioInspired Processing, BIP 2025
Country/TerritoryCosta Rica
CityPerez Zeledon
Period3/12/255/12/25

Keywords

  • Deep Learning
  • Large Language Models
  • Performance Profiling
  • Weight Formats

Fingerprint

Dive into the research topics of 'Performance Evaluation of Deep Learning Formats: A Comparative Study of ONNX and Pytorch for Inference Efficiency and Portability'. Together they form a unique fingerprint.

Cite this