Gnosis — Qwen3-4B-Thinking-2507 (Self-Awareness Correctness Head)

Gnosis is a lightweight self-awareness head that attaches to a frozen LLM and predicts a scalar correctness probability for a generated response. It reads the backbone’s internal signals—hidden-state features (latent dynamics) and attention-map patterns—to learn reliable hallucination / error cues directly from the model.

Paper: Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Project code & instructions: https://github.com/Amirhosein-gh98/Gnosis

Why it matters

  • Strong verifier signal without a large external reward model (no RM routing / no judge LLM calls).
  • ~1000× smaller than 8B reward-model verifiers (**5M params vs ~8B**).
  • ~100× faster than routing through an ~8B reward model.
  • Early error detection: can flag likely errors before generation finishes.

Evaluated backbones & benchmarks (from the paper)

  • Backbones: Qwen3 family + OpenAI gpt-oss-20B.
  • Benchmarks: Math-Reasoning (AMC12 2022/2023, AIME 2024/2025, HMMT Feb 2025), Open-Domain QA (18k held-out TriviaQA), Academic Knowledge Reasoning (MMLU-Pro).

Training data

Mixed math + trivia training corpus:

Usage (inference)

This repo requires the local Transformers fork with Gnosis integrated (see the GitHub repo instructions). After installing it, run:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from src.demo import build_chat_prompt, generate_with_hf, correctness_prob

GNOSIS_MODEL_ID = "AmirhoseinGH/Gnosis-Qwen3-4B-Thinking-2507"

tokenizer = AutoTokenizer.from_pretrained(GNOSIS_MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    GNOSIS_MODEL_ID, torch_dtype=torch.bfloat16, trust_remote_code=True
).cuda().eval()

prompt = build_chat_prompt(
    tokenizer,
    question="How many r's are in strawberry?",
    system_prompt="Please reason step by step, and put your final answer within \\boxed{}.",
)

answer = generate_with_hf(model, tokenizer, prompt, torch.device("cuda"), max_new_tokens=2048)
p_correct = correctness_prob(model, tokenizer, prompt + answer, torch.device("cuda"))

print("Answer:
", answer)
print("Gnosis correctness probability:", f"{p_correct:.4f}")

Citation

@misc{ghasemabadi2024llmspredictfailures,
      title={Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits}, 
      author={Amirhosein Ghasemabadi and Di Niu},
      year={2024},
      eprint={2512.20578},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.20578}, 
}
Downloads last month
27
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AmirhoseinGH/Gnosis-Qwen3-4B-Thinking-2507

Finetuned
(150)
this model

Datasets used to train AmirhoseinGH/Gnosis-Qwen3-4B-Thinking-2507

Paper for AmirhoseinGH/Gnosis-Qwen3-4B-Thinking-2507