VRPRM-MiMo-7B

VRPRM-MiMo-7B is a visual process reward model from VRPRM: Process Reward Modeling via Visual Reasoning.

VRPRM is designed to evaluate intermediate reasoning steps for multimodal problems. The model is intended for visual process reward modeling, reasoning-step scoring, and Best-of-N selection for vision-language model outputs.

Model Details

  • Model family: VRPRM
  • Release variant: MiMo-7B
  • Serialized architecture: Qwen2_5_VLForConditionalGeneration
  • Model type: qwen2_5_vl
  • Weights format: sharded safetensors
  • Recommended library: transformers

Training Summary

The VRPRM paper trains the model with a two-stage recipe:

  1. Supervised fine-tuning cold start on high-quality CoT-PRM data. Open-sourced on VRPRM3.6K.
  2. Reinforcement learning scaling on lower-cost non-CoT PRM data.

Intended Use

This model is intended for research on:

  • Visual process reward modeling
  • Multimodal reasoning evaluation
  • Step-level scoring of visual question answering rationales
  • Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

Usage

Load the model with Hugging Face Transformers from the repository root:

from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-MiMo-7B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For the complete inference and evaluation pipeline, use the VRPRM project code.

Citation

@misc{chen2026vrprmprocessrewardmodeling,
      title={VRPRM: Process Reward Modeling via Visual Reasoning}, 
      author={Xinquan Chen and Chongying Yue and Bangwei Liu and Xuhong Wang and Yingchun Wang and Chaochao Lu},
      year={2026},
      eprint={2508.03556},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.03556}, 
}
Downloads last month
29
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for two-tiger/MiMo-VRPRM-7B

Quantizations
1 model

Collection including two-tiger/MiMo-VRPRM-7B

Paper for two-tiger/MiMo-VRPRM-7B