BLADE-LM

Block-Level Autoregressive Diffusion with Causal DiscovEry

BLADE-LM is a hybrid language model built on Qwen2.5-0.5B that combines autoregressive (AR) generation with a diffusion-based think segment. It enables real-time discovery of causal dependencies between token blocks during inference.

⚠️ This model requires a custom inference script. Standard model.generate() will fall back to plain AR generation without the 4-quadrant mask. See Usage below.

Model Details


Base model	Qwen/Qwen2.5-0.5B
Architecture	Dual-path: AR clean segment + Diffusion think segment
Sequence length	1024 (extensible via `--seq_len`)
Block size	8 tokens/block (128 blocks)
Parameters	~0.5B
License	MIT

How It Works

BLADE-LM processes sequences as two parallel segments:

Input:  [ clean segment (L) | think segment (L) ]
         real tokens           all [MASK]

A fixed 4-quadrant attention mask governs interactions between segments:

Quadrant	Direction	Rule
Top-Left	clean → clean	AR causal (lower-triangular)
Top-Right	clean → think	clean[i] attends think[j < i]
Bottom-Left	think → clean	think[i] attends clean[j < i]
Bottom-Right	think → think	intra-block bidirectional only

Inference uses a two-round process:

Round 1: Token-by-token draft generation using beta-mixed AR/Diff logits
Round 2 (optional): Single eager forward with draft as input — activates K_diff via future Q_clean, enabling causal graph extraction

Training


Data	NuminaMath-TIR, StackMathQA, CLadder/e-CARE causal reasoning
Mix ratio	3 : 5 : 2
Steps	40,000 optimizer steps
Batch size	16 × grad_accum 2
Optimizer	AdamW (betas 0.9/0.95)
LR	2e-5 with cosine decay
Loss	l_diff + 0.05 × l_ar + 0.1 × L_intra

Usage

Install dependencies and clone the inference code:

pip install transformers torch pyyaml matplotlib
git clone https://github.com/xiaziye/BLADE-LM.git
cd BLADE-LM

Basic generation

# Using blade_run.py (recommended)
python blade_run.py \
    --model_path Hengzongshu/BLADE-LM \
    --prompt "Solve: 2x + 5 = 13. Show your work.\n" \
    --beta 0.6 \
    --max_tokens 200

With causal graph analysis

python blade_run.py \
    --model_path Hengzongshu/BLADE-LM \
    --prompt "Solve: 2x + 5 = 13. Show your work.\n" \
    --beta 0.6 \
    --causal \
    --causal_out causal.png

Parameters

Parameter	Default	Description
`--beta`	0.6	AR/Diff mix ratio. 0 = pure Diff, 1 = pure AR. Recommended: 0.49–0.90
`--max_tokens`	200	Maximum tokens to generate
`--seq_len`	1024	Sequence length (must be divisible by 8)
`--causal`	false	Run causal graph analysis after generation

Causal Discovery

After generation, BLADE-LM can produce a block-level causal graph showing which past blocks influenced which future blocks. Example output on math reasoning:

[1] block 1 → block 33  gap=32  strength=0.0017
  cause  j=1:  ' + 5 = 13.'
  effect i=33: '2x + 5 = 1'

[2] block 2 → block 21  gap=19  strength=0.0013
  cause  j=2:  ' Show your work. To solve the equation'
  effect i=21: '4\n   \]\n\n3. **'

The model exhibits adaptive causal depth — simple queries produce shallow linear graphs while multi-step reasoning produces richer long-range dependencies. This behavior emerges naturally without explicit supervision.

Experimental Results

Tested across math reasoning, causal reasoning, and logical sorting tasks:

Global anchors: Prompt tokens consistently influence conclusion blocks across long gaps (gap > 30)
Structure propagation: Early format-establishing blocks influence all subsequent similar blocks
Conclusion aggregation: Final answer blocks receive contributions from multiple intermediate reasoning steps
Adaptive depth: Causal graph complexity scales naturally with task difficulty

Causal signal strength is in the 0.001 range at 0.5B scale — meaningful but weak. Larger scale is expected to amplify the signal.

Limitations

Causal signal strength is relatively weak at 0.5B scale
No KV cache support (custom 2L mask is incompatible with standard DynamicCache)
Inference is slower than standard AR models due to full 2L forward per token
Must use blade_run.py for correct inference — model.generate() bypasses the 4-quadrant mask
Beta may require tuning for different prompt styles

Citation

@misc{blade2025,
  title  = {BLADE-LM: Block-Level Autoregressive Diffusion with Causal Discovery},
  author = {Xia Ziye},
  year   = {2025},
  url    = {https://huggingface.co/Hengzongshu/BLADE-LM}
}

Downloads last month: 21

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for Hengzongshu/BLADE-LM

Base model

Qwen/Qwen2.5-0.5B

Finetuned

(628)

this model