Qwopus3.6-27B-v2-RYS-Balanced

This is a RYS/relayer export of Jackrong/Qwopus3.6-27B-v2.

The checkpoint physically duplicates selected decoder layers in the Hugging Face safetensors weights. It does not require the RYS runtime wrapper.

Variant

  • Target repo: hampsonw/Qwopus3.6-27B-v2-RYS-Balanced
  • Objective: Balanced Math+EQ
  • Repeated block: 15,30 (--blocks "15,30")
  • Repeated source layers: 15–29 inclusive, zero-indexed
  • Source text layers: 64
  • Target text layers: 79
  • Extra repeated layers: 15

Probe result

From the BF16 Transformers scan over math_16 + eq_16:

Metric Score Delta vs baseline
Math 0.818086 +0.104831
EQ 0.741500 +0.010000

Rank note: balanced rank 1.

Baseline scores from the scan:

  • Math: 0.713255
  • EQ: 0.731500

Local diagnostic evaluation results

These are local diagnostic subsample results, not full benchmark claims. They were run through an OpenAI-compatible vLLM endpoint with Qwen3.6 thinking-mode sampling:

  • temperature=1.0, top_p=0.95, top_k=20, min_p=0.0
  • thinking_token_budget=32768
  • max_tokens=81920
  • MATH prompt: Please reason step by step, and put your final answer within \boxed{}.

Comparisons below use:

  • Qwen baseline: cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4
  • Qwopus baseline: Jackrong/Qwopus3.6-27B-v2
  • This model: hampsonw/Qwopus3.6-27B-v2-RYS-Balanced

MATH-500 hardest-50 diagnostic

All three models solved the 50-item slice after manual normalization of mathematically equivalent answer formats. Raw scorer misses were formatting-equivalence issues such as 5.5 vs \frac{11}{2} and 2\sqrt{3}+1 vs 1+2\sqrt{3}.

Model Raw scorer Audited Completion tokens Reasoning tokens Total tokens
Qwen3.6-27B baseline 47/50 50/50 633,524 599,909 639,736
Qwopus3.6-27B-v2 46/50 50/50 304,733 270,520 310,945
Qwopus3.6-27B-v2-RYS-Balanced 45/50 50/50 301,726 267,582 307,938

On this MATH diagnostic slice, RYS-Balanced matched Qwopus v2 audited accuracy while using roughly the same number of tokens, and both Qwopus variants used about half the completion tokens of the Qwen baseline.

LiveCodeBench release_v6 hardest-49 diagnostic

This diagnostic uses the deterministic hardest-50 slice from public livecodebench/code_generation_lite release_v6, with item 3344 dropped as a whole question because it contains a malformed private testcase outside the stated List[List[int]] function contract.

Model Correct Accuracy Completion tokens Reasoning tokens Total tokens
Qwen3.6-27B baseline 43/49 87.8% 1,640,947 1,440,065 1,672,212
Qwopus3.6-27B-v2 45/49 91.8% 1,400,939 1,347,042 1,432,204
Qwopus3.6-27B-v2-RYS-Balanced 32/49 65.3% 983,974 806,168 1,015,803

RYS-Balanced was substantially cheaper in tokens on this LCB diagnostic, but accuracy was much worse than both baselines. It had two terminal failed items (2952, 3233) and many more non-passing solutions.

Interpretation

This checkpoint is currently best viewed as a math-focused experimental RYS export. The small MATH diagnostic looked strong and token-efficient, but the LiveCodeBench diagnostic regressed significantly. Use with caution for coding-heavy workloads.

Provenance

  • Source model: Jackrong/Qwopus3.6-27B-v2
  • Export method: RYS physical layer duplication
  • Export manifest: rys_export_manifest.json
  • Probe result bundle: qwopus36-bf16-results_20260523T213305Z.tar.zst

Notes

This model has not yet been validated on the larger math_120 + eq_140 probe set or full public benchmark suite. The local diagnostics above suggest the RYS-Balanced export may preserve math performance while reducing tokens, but it regressed on the coding diagnostic. Treat it as experimental.

Downloads last month
74
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hampsonw/Qwopus3.6-27B-v2-RYS-Balanced

Finetuned
(5)
this model