@RiverRider on Hugging Face: "A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight…"

Post

221

A single forward pass of the frozen Qwen-2.5-7B model plus a lightweight classifier reaches 0.866 plus or minus 0.011 AUC on the full TruthfulQA-MC2 benchmark. No adapters. No fine-tuning. No extra parameters on the backbone.

This is the strongest hidden-state truthfulness detector reported on the benchmark to date.

The same latent features that the SRT-NLA-AV-v1 demo reads out as coherent natural-language verbalizations turn out to be rich enough to support production-grade auditing for honesty versus hallucination. The internal semiotic infrastructure we have been exploring in public is already information-dense enough to solve hard downstream problems with almost trivial overhead.

You can watch the underlying latent geometry in action right here:
RiverRider/srt-nla-av-v1-demo

Full code, artifacts, and reproduction steps are in the repository:
https://github.com/space-bacon/SRT

Try the Glass Box
RiverRider/srt-nla-demo

Join the conversation