Credit to virtuous7373 for posting the CABS implementation used here.

CABS sparsified version of the original GRPO training Lambent/Qwen3-4B-Base-Continued-GRPO, merged with TIES to the Lambent/Qwen3-4B-Base-Continued-GRPO-B model.

Task	Metric	Base	Trained	Delta
arc_easy	acc	0.7891	0.7870	-0.27%
arc_easy	acc_norm	0.7609	0.7605	-0.05%
lambada_openai	acc	0.6912	0.6984	+1.04%
lambada_openai	perplexity	4.2433	4.0490	-4.6% ↓
openbookqa	acc	0.3160	0.3180	+0.63%
openbookqa	acc_norm	0.4100	0.4120	+0.49%
piqa	acc	0.7797	0.7807	+0.13%
piqa	acc_norm	0.7807	0.7807	+0.00%

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the TIES merge method using ./merged_models/llm-judge-merged-fixed as a base.

Models Merged

The following models were included in the merge:

./merged_models/grpo-cabs

Configuration

The following YAML configuration was used to produce this model:

# TIES merge: Judge as base, inject sparse GRPO knowledge
merge_method: ties
base_model: ./merged_models/llm-judge-merged-fixed
models:
  - model: ./merged_models/grpo-cabs
    parameters:
      density: 0.5
      weight: 0.4
dtype: bfloat16

Downloads last month: 8

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Lambent/Qwen3-4B-Base-Continued-GRPO-Merge

Lambent/Qwen3-4B-Base-Continued-GRPO

Lambent/Qwen3-4B-Base-Continued-GRPO-B

Merge model

this model

Quantizations

2 models

Paper for Lambent/Qwen3-4B-Base-Continued-GRPO-Merge

Resolving Interference When Merging Models

Paper • 2306.01708 • Published Jun 2, 2023 • 15