TY - GEN
T1 - Uncertainty Quantification in Large Language Models Using Feature Space Density and Clustering
AU - Mora-Cross, María
AU - Calderón-Ramírez, Saúl
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Uncertainty Quantification (UQ) in Large Language Models (LLMs) is essential for reliability in high-impact domains. Traditional methods, such as Monte Carlo Dropout (MCD) and Temperature Scaling (TS), although widely adopted, often fall short in effectively capturing uncertainty in generative LLMs, and MCD can be computationally expensive. This study introduces a novel UQ approach using feature density estimation (Kernel Density Estimation, KDE) and clustering methods (Kmeans and Hierarchical Density-Based Spatial Clustering of Applications with Noise, HDBSCAN) to detect low-density regions in the hidden state space, which correlate with higher uncertainty. We evaluated the method on the MedQuAD dataset by extracting hidden states from multiple layers of LLaMA 3.1 8B and DeepSeek 7B, followed by dimensionality reduction with Principal Component Analysis (PCA) and Tucker decomposition. Effectiveness was measured through correlations with text quality metrics (BERTScore and METEOR) and validated via Wilcoxon Signed-Rank and Paired T-Tests. The best performance was achieved with HDBSCAN-KNN at 128 dimensions using PCA on outer layers, showing stronger negative Pearson correlations with BERTScore than MCD, with improvements of 0.19 for LLaMA 3.18 B and 0.24 for DeepSeek 7B. All differences were statistically significant, confirming that feature space-based methods quantify uncertainty more effectively than MCD.
AB - Uncertainty Quantification (UQ) in Large Language Models (LLMs) is essential for reliability in high-impact domains. Traditional methods, such as Monte Carlo Dropout (MCD) and Temperature Scaling (TS), although widely adopted, often fall short in effectively capturing uncertainty in generative LLMs, and MCD can be computationally expensive. This study introduces a novel UQ approach using feature density estimation (Kernel Density Estimation, KDE) and clustering methods (Kmeans and Hierarchical Density-Based Spatial Clustering of Applications with Noise, HDBSCAN) to detect low-density regions in the hidden state space, which correlate with higher uncertainty. We evaluated the method on the MedQuAD dataset by extracting hidden states from multiple layers of LLaMA 3.1 8B and DeepSeek 7B, followed by dimensionality reduction with Principal Component Analysis (PCA) and Tucker decomposition. Effectiveness was measured through correlations with text quality metrics (BERTScore and METEOR) and validated via Wilcoxon Signed-Rank and Paired T-Tests. The best performance was achieved with HDBSCAN-KNN at 128 dimensions using PCA on outer layers, showing stronger negative Pearson correlations with BERTScore than MCD, with improvements of 0.19 for LLaMA 3.18 B and 0.24 for DeepSeek 7B. All differences were statistically significant, confirming that feature space-based methods quantify uncertainty more effectively than MCD.
KW - feature space density estimation
KW - Kernel Density Estimation
KW - large language models
KW - Monte Carlo dropout
KW - uncertainty quantification
UR - https://www.scopus.com/pages/publications/105038721889
U2 - 10.1109/BIP68491.2025.11489143
DO - 10.1109/BIP68491.2025.11489143
M3 - Contribución a la conferencia
AN - SCOPUS:105038721889
T3 - 2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025
BT - 2025 IEEE 7th International Conference on BioInspired Processing, BIP 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th IEEE International Conference on BioInspired Processing, BIP 2025
Y2 - 3 December 2025 through 5 December 2025
ER -