Overview
- Builds supervised fine-tuning pipelines that stay more reliable under distribution shift.
- Uses causal LLM log-likelihood and self-certainty scores to derive KL-robust training weights.
- Stress-tests models with semantically equivalent paraphrases to measure across-round variability.
Methods and contribution
Multi-turn chat data are converted into reproducible SFT examples, then scored to estimate
predictive fit and epistemic self-certainty. These quantities are combined through a
KL-constrained reweighting scheme that emphasizes harder or less stable examples in a
principled way.
In current experiments, LoRA fine-tuning of Qwen-0.6B is compared against alternative
baselines, and paraphrase batches generated through OpenAI Batch are used to quantify
sensitivity across semantically equivalent rounds.
Materials
Paper
Slides
Training code not public