Qwopus3.6-27B-v2-RYS-Balanced

This is a RYS/relayer export of Jackrong/Qwopus3.6-27B-v2.

The checkpoint physically duplicates selected decoder layers in the Hugging Face safetensors weights. It does not require the RYS runtime wrapper.

Variant

Target repo: hampsonw/Qwopus3.6-27B-v2-RYS-Balanced
Objective: Balanced Math+EQ
Repeated block: 15,30 (--blocks "15,30")
Repeated source layers: 15–29 inclusive, zero-indexed
Source text layers: 64
Target text layers: 79
Extra repeated layers: 15

Probe result

From the BF16 Transformers scan over math_16 + eq_16:

Metric	Score	Delta vs baseline
Math	0.818086	+0.104831
EQ	0.741500	+0.010000

Rank note: balanced rank 1.

Baseline scores from the scan:

Math: 0.713255
EQ: 0.731500

Local diagnostic evaluation results

These are local diagnostic subsample results, not full benchmark claims. They were run through an OpenAI-compatible vLLM endpoint with Qwen3.6 thinking-mode sampling:

temperature=1.0, top_p=0.95, top_k=20, min_p=0.0
thinking_token_budget=32768
max_tokens=81920
MATH prompt: Please reason step by step, and put your final answer within \boxed{}.

Comparisons below use:

Qwen baseline: cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4
Qwopus baseline: Jackrong/Qwopus3.6-27B-v2
This model: hampsonw/Qwopus3.6-27B-v2-RYS-Balanced

MATH-500 hardest-50 diagnostic

All three models solved the 50-item slice after manual normalization of mathematically equivalent answer formats. Raw scorer misses were formatting-equivalence issues such as 5.5 vs \frac{11}{2} and 2\sqrt{3}+1 vs 1+2\sqrt{3}.

Model	Raw scorer	Audited	Completion tokens	Reasoning tokens	Total tokens
Qwen3.6-27B baseline	47/50	50/50	633,524	599,909	639,736
Qwopus3.6-27B-v2	46/50	50/50	304,733	270,520	310,945
Qwopus3.6-27B-v2-RYS-Balanced	45/50	50/50	301,726	267,582	307,938

On this MATH diagnostic slice, RYS-Balanced matched Qwopus v2 audited accuracy while using roughly the same number of tokens, and both Qwopus variants used about half the completion tokens of the Qwen baseline.

LiveCodeBench release_v6 hardest-49 diagnostic

This diagnostic uses the deterministic hardest-50 slice from public livecodebench/code_generation_lite release_v6, with item 3344 dropped as a whole question because it contains a malformed private testcase outside the stated List[List[int]] function contract.

Model	Correct	Accuracy	Completion tokens	Reasoning tokens	Total tokens
Qwen3.6-27B baseline	43/49	87.8%	1,640,947	1,440,065	1,672,212
Qwopus3.6-27B-v2	45/49	91.8%	1,400,939	1,347,042	1,432,204
Qwopus3.6-27B-v2-RYS-Balanced	32/49	65.3%	983,974	806,168	1,015,803

RYS-Balanced was substantially cheaper in tokens on this LCB diagnostic, but accuracy was much worse than both baselines. It had two terminal failed items (2952, 3233) and many more non-passing solutions.

Interpretation

This checkpoint is currently best viewed as a math-focused experimental RYS export. The small MATH diagnostic looked strong and token-efficient, but the LiveCodeBench diagnostic regressed significantly. Use with caution for coding-heavy workloads.

Provenance

Source model: Jackrong/Qwopus3.6-27B-v2
Export method: RYS physical layer duplication
Export manifest: rys_export_manifest.json
Probe result bundle: qwopus36-bf16-results_20260523T213305Z.tar.zst

Notes

This model has not yet been validated on the larger math_120 + eq_140 probe set or full public benchmark suite. The local diagnostics above suggest the RYS-Balanced export may preserve math performance while reducing tokens, but it regressed on the coding diagnostic. Treat it as experimental.

Downloads last month: 74

Safetensors

Model size

33B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hampsonw/Qwopus3.6-27B-v2-RYS-Balanced

Base model

Jackrong/Qwopus3.6-27B-v2

Finetuned

(5)

this model