Overview
- Builds supervised fine-tuning pipelines that stay more reliable under distribution shift.
- Uses causal LLM log-likelihood and self-certainty scores to derive KL-robust training weights.
- Stress-tests models with semantically equivalent paraphrases to measure across-round variability.
Abstract
Updated June 22, 2026
Large language models (LLMs) are trained to minimize nominal
predictive loss (e.g. perplexity) under a fixed data distribution, but
in deployment they face shifting prompts and user populations, leading
to epistemic failures such as hallucinations and effectively random
guesses. We propose an Epistemic Uncertainty Accountant (EUA) for
history-aware sequence models that upper-bounds a composite loss
combining perplexity with hallucination-sensitive functionals (such as
self-certainty) over all distributions in a KL neighbourhood of a
nominal base (evaluation or teacher).
Modelling the LLM as a history-dependent kernel on a finite vocabulary
satisfying a chains-with-complete-connections (CCC) condition, we show
that general EUA evaluation is #P-hard but derive: (i) an exact dynamic
program for finite-memory kernels and (ii) a polynomial-time
approximation scheme under CCC with EOS, together with a certified
Monte Carlo estimator. On the statistical side, we prove consistency and
joint asymptotic normality of the EUA tilt parameters for the composite
loss and obtain asymptotically valid confidence intervals for the robust
risk. Simulations on Llama and Qwen with semantically matched prompts
empirically confirm the theory and illustrate how EUA-based
post-training reduces composite risk and systematically attenuates
hallucination-prone behaviour.
Methods and contribution
Multi-turn chat data are converted into reproducible SFT examples, then scored to estimate
predictive fit and epistemic self-certainty. These quantities are combined through a
KL-constrained reweighting scheme that emphasizes harder or less stable examples in a
principled way.
In current experiments, LoRA fine-tuning of Qwen-0.6B is compared against alternative
baselines, and paraphrase batches generated through OpenAI Batch are used to quantify
sensitivity across semantically equivalent rounds.
Materials
Paper
Slides
Training code not public