MediX-R1: Open Ended Medical Reinforcement Learning
Abstract
MediX-R1 presents an open-ended reinforcement learning framework for medical multimodal large language models that uses diverse reward signals and LLM-based evaluation to improve clinical reasoning beyond multiple-choice formats.
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correctness with a strict YES/NO decision, a medical embedding-based semantic reward to capture paraphrases and terminology variants, and lightweight format and modality rewards that enforce interpretable reasoning and modality recognition. This multi-signal design provides stable, informative feedback for open-ended outputs where traditional verifiable or MCQ-only rewards fall short. To measure progress, we propose a unified evaluation framework for both text-only and image+text tasks that uses a Reference-based LLM-as-judge in place of brittle string-overlap metrics, capturing semantic correctness, reasoning, and contextual alignment. Despite using only sim51K instruction examples, MediX-R1 achieves excellent results across standard medical LLM (text-only) and VLM (image + text) benchmarks, outperforming strong open-source baselines and delivering particularly large gains on open-ended clinical tasks. Our results demonstrate that open-ended RL with comprehensive reward signals and LLM-based evaluation is a practical path toward reliable medical reasoning in multimodal models. Our trained models, curated datasets and source code are available at https://medix.cvmbzuai.com
Community
MediX-R1 is an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes vision-language backbones with Group-Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward, a medical embedding-based semantic reward, and lightweight format and modality rewards that enforce interpretable reasoning.
Despite using only ~51K instruction examples, MediX-R1 achieves excellent results across standard medical LLM and VLM benchmarks, outperforming strong open-source baselines.
Highlights:
- Our 8B model achieves an overall average of 68.8%, outperforming the much larger 27B MedGemma (68.4%).
- Our 30B model achieves the best overall score of 73.6%, demonstrating the effectiveness of our composite reward design.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning (2026)
- P2S: Probabilistic Process Supervision for General-Domain Reasoning Question Answering (2026)
- KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning (2026)
- Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting (2026)
- From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation (2026)
- Grad2Reward: From Sparse Judgment to Dense Rewards for Improving Open-Ended LLM Reasoning (2026)
- Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 6
Browse 6 models citing this paperDatasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper