Current Project

History-Aware Epistemic Accounting in LLMs: Certainty-Driven Robust Inference

Robust supervised fine-tuning under distribution shift using uncertainty-aware reweighting, paraphrase stress testing, and KL-constrained objectives.

2025-Present Large Language Models Uncertainty

Overview

  • Builds supervised fine-tuning pipelines that stay more reliable under distribution shift.
  • Uses causal LLM log-likelihood and self-certainty scores to derive KL-robust training weights.
  • Stress-tests models with semantically equivalent paraphrases to measure across-round variability.

Abstract

Updated June 22, 2026

Large language models (LLMs) are trained to minimize nominal predictive loss (e.g. perplexity) under a fixed data distribution, but in deployment they face shifting prompts and user populations, leading to epistemic failures such as hallucinations and effectively random guesses. We propose an Epistemic Uncertainty Accountant (EUA) for history-aware sequence models that upper-bounds a composite loss combining perplexity with hallucination-sensitive functionals (such as self-certainty) over all distributions in a KL neighbourhood of a nominal base (evaluation or teacher).

Modelling the LLM as a history-dependent kernel on a finite vocabulary satisfying a chains-with-complete-connections (CCC) condition, we show that general EUA evaluation is #P-hard but derive: (i) an exact dynamic program for finite-memory kernels and (ii) a polynomial-time approximation scheme under CCC with EOS, together with a certified Monte Carlo estimator. On the statistical side, we prove consistency and joint asymptotic normality of the EUA tilt parameters for the composite loss and obtain asymptotically valid confidence intervals for the robust risk. Simulations on Llama and Qwen with semantically matched prompts empirically confirm the theory and illustrate how EUA-based post-training reduces composite risk and systematically attenuates hallucination-prone behaviour.

Methods and contribution

Multi-turn chat data are converted into reproducible SFT examples, then scored to estimate predictive fit and epistemic self-certainty. These quantities are combined through a KL-constrained reweighting scheme that emphasizes harder or less stable examples in a principled way.

In current experiments, LoRA fine-tuning of Qwen-0.6B is compared against alternative baselines, and paraphrase batches generated through OpenAI Batch are used to quantify sensitivity across semantically equivalent rounds.

Materials

Paper Slides Training code not public