DecomposeRL Tiny-Judge: Coverage Judge

Paper

Paper Project Page Dataset Collection GitHub

A ModernBERT-large classifier that predicts the claim verdict from the collected answers alone (without the original document) — the coverage reward that tests whether a decomposition is collectively sufficient.

It is part of the DecomposeRL tiny-judge stack — eight task-specific LoRA classifier heads on a shared ModernBERT-large backbone that distill a Qwen3-32B LLM judge into small, fast reward models. Swapping the 32B judge for this ~400M-parameter stack cuts GRPO judge compute by ~80% (240 → 48 GPU-hours) while retaining ~99% of in-domain accuracy.

Model Overview

Property Value
Model Type ModernBertForSequenceClassification (sequence classification)
Base Model answerdotai/ModernBERT-large (~400M params)
Training LoRA (r=64, α=128), merged into the base before release
Labels 3-way: supported / refuted / not_enough_information
Distilled from Qwen/Qwen3-32B judge labels
Dataset / config dipta007/decomposeRL-tiny-judge · coverage
Train split train_balanced (class-balanced); selected on macro-F1
Language English

What it judges

Provides the set-level coverage reward (R_cov): if the gold verdict cannot be recovered from the answers alone, the decomposition has missed something. This same head is also reused to compute the necessity (leave-one-out) reward, where it is re-run on the full answer set and on each leave-one-out subset to detect which questions actually change the verdict.

Input format

Claim + the collected answers from the full decomposition:

Claim: {claim}
Answers:
{answers}

Label space

Label Name Meaning
0 supported the answers alone support the claim
1 refuted the answers alone refute the claim
2 not_enough_information the answers are insufficient to decide

Quickstart

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo = "dipta007/coverage-judge-balanced"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = (
    'Claim: Propofol is associated with impaired brain metabolism during hypothermic circulatory arrest: an experimental microdialysis study.\\n'
    'Answers:\\n'
    '- Yes, the evidence document states twenty female juvenile pigs underwent 75 minutes of HCA at a brain temperature of 18 degrees C...'
)

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
with torch.no_grad():
    logits = model(**inputs).logits
pred = int(logits.argmax(-1))
print(pred, model.config.id2label[pred])
# expected: 2 -> not_enough_information

Training Data

Trained on the coverage config of dipta007/decomposeRL-tiny-judge, whose labels are distilled from Qwen3-32B judge calls made during DecomposeRL reward computation. The model is fine-tuned with LoRA on the class-balanced train_balanced split, validated on the natural validation split, and the best checkpoint is chosen by macro-F1. LoRA adapters are merged into the backbone before release, so the model loads with a plain from_pretrained (no PEFT required).

Role in DecomposeRL

DecomposeRL trains a claim-verification policy with GRPO over a seven-reward ensemble. Five of those rewards are scored by an LLM judge, which dominates training-time GPU cost. The tiny-judge stack replaces that 32B judge with eight small distilled heads so reward scoring runs on the same single GPU as training. See the paper (tiny-judge ablation) and the DecomposeRL-7B model for the full reward design.

Intended Use

  • In-scope: serving as a fast reward / scoring model inside the DecomposeRL training loop, or as a standalone classifier for the specific judgment above on claim-decomposition traces.
  • Out-of-scope: general-purpose fact-checking, use on inputs that do not follow the input format above, or as a standalone end-to-end claim verifier (use DecomposeRL-7B for that).

Citation

@article{dipta2025decomposerl,
  title={DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification},
  author={Shubhashis Roy Dipta and Ankur Padia and Francis Ferraro},
  year={2025},
  eprint={2605.27858},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.27858v1},
}

License

Released under the Apache 2.0 License.

Downloads last month
65
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dipta007/coverage-judge-balanced

Finetuned
(317)
this model

Dataset used to train dipta007/coverage-judge-balanced

Collection including dipta007/coverage-judge-balanced

Paper for dipta007/coverage-judge-balanced