A Novel Uncertainty Estimator for Detecting Large Language Models Confabulations
-
Graphical Abstract
-
Abstract
Large language models (LLMs), such as ChatGPT and ChatGLM, have transformed natural language processing tasks but remain prone to generating confabulations—plausible yet incorrect or nonsensical outputs. Detecting these confabulations is critical for ensuring the reliability of LLM-driven applications. In this paper, we propose Similar Cluster Frequency Entropy (SCFE), a novel entropy-based uncertainty estimator for identifying confabulations in LLM outputs. SCFE works by clustering multiple model-generated responses using cosine similarity to capture similar textual instances, followed by estimating the categorical distribution across these clusters. The uncertainty is quantified through the entropy of this distribution, which reflects the spread of the cluster assignment frequency. High entropy indicates a diffuse probability distribution across clusters, signaling uncertainty, while low entropy suggests a concentrated distribution, indicative of higher confidence. Unlike traditional token-likelihood-based measures, SCFE evaluates uncertainty at a textual-level, capturing meaningful variations across responses. Experimental evaluations on diverse LLMs and benchmark datasets demonstrate that SCFE surpasses existing uncertainty estimation techniques in detecting confabulations, offering a robust tool for enhancing the reliability of LLM outputs in practical applications.
-
-