Qwopus3.6-27B-v2-RYS-Balanced
This is a RYS/relayer export of Jackrong/Qwopus3.6-27B-v2.
The checkpoint physically duplicates selected decoder layers in the Hugging Face safetensors weights. It does not require the RYS runtime wrapper.
Variant
- Target repo:
hampsonw/Qwopus3.6-27B-v2-RYS-Balanced - Objective: Balanced Math+EQ
- Repeated block:
15,30(--blocks "15,30") - Repeated source layers: 15–29 inclusive, zero-indexed
- Source text layers: 64
- Target text layers: 79
- Extra repeated layers: 15
Probe result
From the BF16 Transformers scan over math_16 + eq_16:
| Metric | Score | Delta vs baseline |
|---|---|---|
| Math | 0.818086 | +0.104831 |
| EQ | 0.741500 | +0.010000 |
Rank note: balanced rank 1.
Baseline scores from the scan:
- Math:
0.713255 - EQ:
0.731500
Local diagnostic evaluation results
These are local diagnostic subsample results, not full benchmark claims. They were run through an OpenAI-compatible vLLM endpoint with Qwen3.6 thinking-mode sampling:
temperature=1.0,top_p=0.95,top_k=20,min_p=0.0thinking_token_budget=32768max_tokens=81920- MATH prompt:
Please reason step by step, and put your final answer within \boxed{}.
Comparisons below use:
- Qwen baseline:
cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4 - Qwopus baseline:
Jackrong/Qwopus3.6-27B-v2 - This model:
hampsonw/Qwopus3.6-27B-v2-RYS-Balanced
MATH-500 hardest-50 diagnostic
All three models solved the 50-item slice after manual normalization of mathematically equivalent answer formats. Raw scorer misses were formatting-equivalence issues such as 5.5 vs \frac{11}{2} and 2\sqrt{3}+1 vs 1+2\sqrt{3}.
| Model | Raw scorer | Audited | Completion tokens | Reasoning tokens | Total tokens |
|---|---|---|---|---|---|
| Qwen3.6-27B baseline | 47/50 | 50/50 | 633,524 | 599,909 | 639,736 |
| Qwopus3.6-27B-v2 | 46/50 | 50/50 | 304,733 | 270,520 | 310,945 |
| Qwopus3.6-27B-v2-RYS-Balanced | 45/50 | 50/50 | 301,726 | 267,582 | 307,938 |
On this MATH diagnostic slice, RYS-Balanced matched Qwopus v2 audited accuracy while using roughly the same number of tokens, and both Qwopus variants used about half the completion tokens of the Qwen baseline.
LiveCodeBench release_v6 hardest-49 diagnostic
This diagnostic uses the deterministic hardest-50 slice from public livecodebench/code_generation_lite release_v6, with item 3344 dropped as a whole question because it contains a malformed private testcase outside the stated List[List[int]] function contract.
| Model | Correct | Accuracy | Completion tokens | Reasoning tokens | Total tokens |
|---|---|---|---|---|---|
| Qwen3.6-27B baseline | 43/49 | 87.8% | 1,640,947 | 1,440,065 | 1,672,212 |
| Qwopus3.6-27B-v2 | 45/49 | 91.8% | 1,400,939 | 1,347,042 | 1,432,204 |
| Qwopus3.6-27B-v2-RYS-Balanced | 32/49 | 65.3% | 983,974 | 806,168 | 1,015,803 |
RYS-Balanced was substantially cheaper in tokens on this LCB diagnostic, but accuracy was much worse than both baselines. It had two terminal failed items (2952, 3233) and many more non-passing solutions.
Interpretation
This checkpoint is currently best viewed as a math-focused experimental RYS export. The small MATH diagnostic looked strong and token-efficient, but the LiveCodeBench diagnostic regressed significantly. Use with caution for coding-heavy workloads.
Provenance
- Source model:
Jackrong/Qwopus3.6-27B-v2 - Export method: RYS physical layer duplication
- Export manifest:
rys_export_manifest.json - Probe result bundle:
qwopus36-bf16-results_20260523T213305Z.tar.zst
Notes
This model has not yet been validated on the larger math_120 + eq_140 probe set or full public benchmark suite. The local diagnostics above suggest the RYS-Balanced export may preserve math performance while reducing tokens, but it regressed on the coding diagnostic. Treat it as experimental.
- Downloads last month
- 74
Model tree for hampsonw/Qwopus3.6-27B-v2-RYS-Balanced
Base model
Jackrong/Qwopus3.6-27B-v2