Model Card for ProRAG-PRM

This is the Process Reward Model (PRM) associated with the ProRAG project. It is fine-tuned from Qwen/Qwen3-8B to evaluate the quality of intermediate reasoning steps.

Based on the methodology described in the paper associated with arXiv ID: 2601.21912.

Model Details

  • Base Model: Qwen3-8B
  • Type: Process Reward Model (PRM) / Sequence Classification
  • Task: Step-by-step Reasoning Evaluation
  • Paper: View on arXiv

💻 Code & Inference

This model is designed to assign rewards/scores to reasoning steps.

For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository:

👉 Click here to view the GitHub Repository

(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)

Citation

If you use this model or the associated paper in your research, please cite:

@misc{wang2026proragprocesssupervisedreinforcementlearning,
      title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation}, 
      author={Zhao Wang and Ziliang Zhao and Zhicheng Dou},
      year={2026},
      eprint={2601.21912},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.21912}, 
}
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bmbgsj/ProRAG_PRM

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(909)
this model

Collection including bmbgsj/ProRAG_PRM

Paper for bmbgsj/ProRAG_PRM