Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Performance Evaluation of Deep Learning Formats: A Comparative Study of ONNX and Pytorch for Inference Efficiency and Portability

  • Costa Rica Institute of Technology
  • National High Technology Center

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The high computational cost of Large Language Models (LLMs) requires post-training optimization for efficient deployment. This paper presents a rigorous empirical comparison of inference performance between the native PyTorch framework and the optimized ONNX format for contemporary LLMs, including Gemma-7B, Phi-3-Medium, and Llama-3.1-8B, on NVIDIA GPU hardware. Using low-level profiling with NVIDIA's Nsight Compute suite, we found that ONNX conversion yields a substantial and statistically significant performance advantage. Our results show a latency reduction of up to 27.6% and an increase in computational throughput (GFLOP/s) of up to 48.3%. We conclude that adopting an ONNX-based strategy is not merely a portability choice but a critical optimization step for deploying high-performance, efficient LLMs in production environments.

Idioma originalInglés
Título de la publicación alojada2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798331570149
DOI
EstadoPublicada - 2025
Evento7th IEEE International Conference on BioInspired Processing, BIP 2025 - Perez Zeledon, Costa Rica
Duración: 3 dic 20255 dic 2025

Serie de la publicación

Nombre2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025

Conferencia

Conferencia7th IEEE International Conference on BioInspired Processing, BIP 2025
País/TerritorioCosta Rica
CiudadPerez Zeledon
Período3/12/255/12/25

Huella

Profundice en los temas de investigación de 'Performance Evaluation of Deep Learning Formats: A Comparative Study of ONNX and Pytorch for Inference Efficiency and Portability'. En conjunto forman una huella única.

Citar esto