Credit to virtuous7373 for posting the CABS implementation used here.

CABS sparsified version of the original GRPO training Lambent/Qwen3-4B-Base-Continued-GRPO, merged with TIES to the Lambent/Qwen3-4B-Base-Continued-GRPO-B model.

Task Metric Base Trained Delta
arc_easy acc 0.7891 0.7870 -0.27%
arc_easy acc_norm 0.7609 0.7605 -0.05%
lambada_openai acc 0.6912 0.6984 +1.04%
lambada_openai perplexity 4.2433 4.0490 -4.6% ↓
openbookqa acc 0.3160 0.3180 +0.63%
openbookqa acc_norm 0.4100 0.4120 +0.49%
piqa acc 0.7797 0.7807 +0.13%
piqa acc_norm 0.7807 0.7807 +0.00%

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the TIES merge method using ./merged_models/llm-judge-merged-fixed as a base.

Models Merged

The following models were included in the merge:

  • ./merged_models/grpo-cabs

Configuration

The following YAML configuration was used to produce this model:

# TIES merge: Judge as base, inject sparse GRPO knowledge
merge_method: ties
base_model: ./merged_models/llm-judge-merged-fixed
models:
  - model: ./merged_models/grpo-cabs
    parameters:
      density: 0.5
      weight: 0.4
dtype: bfloat16
Downloads last month
8
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lambent/Qwen3-4B-Base-Continued-GRPO-Merge

Paper for Lambent/Qwen3-4B-Base-Continued-GRPO-Merge