MinCoder
Collection
RL with verify reward • 3 items • Updated • 1
This model is fine-tuned from Qwen3-4B-Instruct using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases — similar to the process of programming task evaluation on LeetCode.
Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.
This is an experimental model
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "beyoru/MinCoder-4B-Exp"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beyoru/MinCoder-4B-Exp", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'