Skip to main navigation Skip to search Skip to main content

Uncertainty Quantification in Large Language Models Using Feature Space Density and Clustering

  • Costa Rica Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Uncertainty Quantification (UQ) in Large Language Models (LLMs) is essential for reliability in high-impact domains. Traditional methods, such as Monte Carlo Dropout (MCD) and Temperature Scaling (TS), although widely adopted, often fall short in effectively capturing uncertainty in generative LLMs, and MCD can be computationally expensive. This study introduces a novel UQ approach using feature density estimation (Kernel Density Estimation, KDE) and clustering methods (Kmeans and Hierarchical Density-Based Spatial Clustering of Applications with Noise, HDBSCAN) to detect low-density regions in the hidden state space, which correlate with higher uncertainty. We evaluated the method on the MedQuAD dataset by extracting hidden states from multiple layers of LLaMA 3.1 8B and DeepSeek 7B, followed by dimensionality reduction with Principal Component Analysis (PCA) and Tucker decomposition. Effectiveness was measured through correlations with text quality metrics (BERTScore and METEOR) and validated via Wilcoxon Signed-Rank and Paired T-Tests. The best performance was achieved with HDBSCAN-KNN at 128 dimensions using PCA on outer layers, showing stronger negative Pearson correlations with BERTScore than MCD, with improvements of 0.19 for LLaMA 3.18 B and 0.24 for DeepSeek 7B. All differences were statistically significant, confirming that feature space-based methods quantify uncertainty more effectively than MCD.

Original languageEnglish
Title of host publication2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331570149
DOIs
StatePublished - 2025
Event7th IEEE International Conference on BioInspired Processing, BIP 2025 - Perez Zeledon, Costa Rica
Duration: 3 Dec 20255 Dec 2025

Publication series

Name2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025

Conference

Conference7th IEEE International Conference on BioInspired Processing, BIP 2025
Country/TerritoryCosta Rica
CityPerez Zeledon
Period3/12/255/12/25

Keywords

  • feature space density estimation
  • Kernel Density Estimation
  • large language models
  • Monte Carlo dropout
  • uncertainty quantification

Fingerprint

Dive into the research topics of 'Uncertainty Quantification in Large Language Models Using Feature Space Density and Clustering'. Together they form a unique fingerprint.

Cite this