Post
718
π§ Introducing Qwen3.5 β Cognitive Reasoning Mode
I fine-tuned Qwen2.5 with GRPO to actually think before it answers β not just pattern-match.
Most LLMs mimic reasoning. This one builds a real cognitive path:
π Plan β understand the task
π Monitor β reason step by step
β Evaluate β verify before answering
Every response follows a strict structured protocol:
<think> <planning> ... <monitoring> ... <evaluation> ... </think>
Then a clean, reasoning-free <output>.
The model self-checks its own structure. If a section is missing or malformed β the response is invalid.
This isn't chain-of-thought slapped on top. The reasoning protocol is baked in via RL.
π Full README + inference code below π
alibidaran/Qwen_COG_Thinker_Merged
#AI #LLM #Qwen #ReasoningModels #GRPO #OpenSource
I fine-tuned Qwen2.5 with GRPO to actually think before it answers β not just pattern-match.
Most LLMs mimic reasoning. This one builds a real cognitive path:
π Plan β understand the task
π Monitor β reason step by step
β Evaluate β verify before answering
Every response follows a strict structured protocol:
<think> <planning> ... <monitoring> ... <evaluation> ... </think>
Then a clean, reasoning-free <output>.
The model self-checks its own structure. If a section is missing or malformed β the response is invalid.
This isn't chain-of-thought slapped on top. The reasoning protocol is baked in via RL.
π Full README + inference code below π
alibidaran/Qwen_COG_Thinker_Merged
#AI #LLM #Qwen #ReasoningModels #GRPO #OpenSource