CodeRM-NT

Paper | Github

Providing accurate reward signals for code generated by LLMs is a significant challenge in applying reinforcement learning (RL) to code generation. Existing methods rely on unit tests, which are expensive to curate and unreliable when automatically synthesized.

CodeRM-NT is a code reward model with no reliance on unit tests. Instead of executing test cases, it learns to estimate the functional correctness of generated Python code from rewards that are collected via Monte Carlo Tree Search (MCTS) guided by LLM-as-a-Judge.

Usage

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained(
    "Rishubi/CodeRM-NT",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Rishubi/CodeRM-NT")

question = "Write a Python function `add(a, b)` that returns the sum of two integers."
response = "def add(a, b):\n    return a + b"

messages = [
    {"role": "user", "content": question},
    {"role": "assistant", "content": response},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
with torch.no_grad():
    reward = model(input_ids).logits.squeeze().float().item()
print(reward)  # higher is better

Results

Key Results

Training with CodeRM-NT consistently outperforms synthetic unit tests and other reward models across multiple code generation benchmarks:

Model Reward HumanEval HumanEval+ MBPP MBPP+ LCB-v5 BCB-I-Hard Avg.
Qwen2.5-Coder-1.5B Unit Tests 73.2 67.7 70.9 61.1 5.1 6.1 47.4
CodeRM-NT 75.0 69.5 72.0 60.8 5.5 7.4 48.4
Qwen2.5-Coder-3B Unit Tests 86.6 82.3 74.9 64.6 13.0 15.5 56.2
CodeRM-NT 88.4 82.3 75.9 66.1 13.6 14.2 56.8
Qwen2.5-Coder-7B Unit Tests 90.9 87.8 85.4 73.0 17.3 18.2 62.1
CodeRM-NT 90.2 86.0 86.8 74.6 17.5 18.2 62.2
GLM-4-9B-0414 Unit Tests 84.1 79.9 81.0 69.0 15.4 15.5 57.5
CodeRM-NT 87.2 81.7 79.9 67.2 15.3 18.2 58.3
Qwen3-4B-Thinking Unit Tests 97.6 92.7 91.0 75.1 50.3 25.7 72.1
CodeRM-NT 97.6 94.5 92.6 77.2 52.1 22.3 72.7

Citation

TODO

Downloads last month
19
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rishubi/CodeRM-NT

Base model

Qwen/Qwen2.5-7B
Finetuned
(399)
this model