Qwen3-4B German Teacher

A finetuned Qwen3-4B model specialized for German language teaching at A1-B1 CEFR levels. This model excels at:

Grammar Error Detection: Identifying grammatical mistakes in German sentences
Error Correction: Providing correct forms with clear explanations
Grammar Judgment: Binary classification of sentence grammaticality (CoLA-style)
Teaching Explanations: Clear, learner-friendly explanations of German grammar rules

Model Details

Property	Value
Base Model	Qwen/Qwen3-4B
Parameters	4B
Training Method	SFT (Supervised Fine-Tuning) with LoRA
LoRA Rank	32
LoRA Alpha	64
Training Epochs	2
Learning Rate	2e-4
Context Length	4096 tokens

Performance

Evaluated on a custom German grammar benchmark:

Metric	Score	Description
CoLA MCC	0.721	Matthews Correlation Coefficient for grammaticality judgment
GEC F1	0.349	Macro F1 for grammar error correction
Generation Quality	3.99/5.0	Human-evaluated response quality
Overall Score	0.633	Weighted composite score

Benchmark Comparison

Comparison with SmolLM3-German-V6 (3B parameters, Q4 quantized):

Metric	Qwen3 German Teacher (4B)	SmolLM3-German-V6 (3B)	Improvement
CoLA MCC	0.721	0.624	+15.5%
GEC F1	0.349	0.145	+140.7%
Generation Quality	3.99	3.19	+25.1%
Overall Score	0.633	0.492	+28.7%

Key advantages of this model:

Superior grammaticality judgment: 15.5% higher CoLA MCC score
Much better error correction: 2.4x better GEC F1 score
Higher quality responses: 0.8 points higher on 5-point scale

Training Data

The model was trained on ~9,000 examples with the following composition:

Category	Percentage	Purpose
Grammar Correction	35%	Error correction patterns
Grammar Judgment	25%	CoLA-style "Is this correct?" examples
Structured Teaching	20%	Verb conjugations, grammar explanations
General Conversation	20%	Fluency preservation

Key Training Focus Areas

haben/sein auxiliary verbs: Movement verbs require "sein" (e.g., "Ich bin gefahren" not "Ich habe gefahren")
Article gender (der/die/das)
Case usage (Nominativ, Akkusativ, Dativ, Genitiv)
Word order in German sentences

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jaigouk/qwen3-4b-german-teacher")
tokenizer = AutoTokenizer.from_pretrained("jaigouk/qwen3-4b-german-teacher")

messages = [
    {"role": "system", "content": "Du bist ein freundlicher Deutschlehrer für A1-B1 Lernende."},
    {"role": "user", "content": "Is this German sentence correct? 'Ich habe nach Berlin gefahren.'"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Ollama (GGUF)

Download the GGUF file: qwen3-german-teacher-Q4_K_M.gguf
Create a Modelfile:

FROM ./qwen3-german-teacher-Q4_K_M.gguf

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
<think>

</think>

{{ .Response }}<|im_end|>
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

SYSTEM """Du bist ein freundlicher Deutschlehrer für A1-B1 Lernende. Erkläre Grammatikfehler klar und einfach. Beurteile grammatische Korrektheit sorgfältig."""

Create and run:

ollama create qwen3-german-teacher -f Modelfile
ollama run qwen3-german-teacher "Is 'Ich habe nach Berlin gefahren' correct?"

Example Interactions

Grammar Error Detection

User: Is this German sentence grammatically correct? "Ich habe nach Berlin gefahren."

Model: No, this sentence is incorrect. The verb "fahren" describes movement from one place to another, so it requires "sein" as the auxiliary verb, not "haben".

Correct: "Ich bin nach Berlin gefahren."

Grammar Explanation

User: Explain when to use "sein" vs "haben" in the Perfekt tense.

Model: In German, most verbs use "haben" in the Perfekt tense, but verbs that describe:

Movement from A to B: gehen, fahren, fliegen, kommen, laufen
Change of state: aufwachen, einschlafen, sterben, werden

These verbs use "sein" instead. For example:

Ich bin nach Hause gegangen. (movement)
Er ist eingeschlafen. (change of state)

Conjugation Help

User: Conjugate "sein" in present tense.

Model: Here is the conjugation of "sein" (to be) in present tense:

Person	Conjugation
ich	bin
du	bist
er/sie/es	ist
wir	sind
ihr	seid
sie/Sie	sind

Limitations

Optimized for A1-B1 level German; may oversimplify advanced grammar concepts
Focused on grammar teaching; not designed for creative writing or translation
Best performance with clear, focused grammar questions

Training Methodology

Research-Informed Approach

This model was developed using insights from recent research on instruction-tuning and continual learning:

GRAPE Data Selection

Based on the GRAPE paper (Feb 2025):

"Instructions paired with responses sampled from other LLMs are often out of the distribution of the target model, which can lead to diminishing returns and even hurt performance"

Key insight: Rather than using responses generated by larger models (GPT-4, Claude), we curated training examples that align with Qwen3's natural output distribution. This prevents the model from learning patterns it cannot reliably reproduce.

CoLA-Style Judgment Training

The breakthrough in V4 was adding 25% explicit grammaticality judgment examples. This teaches the model to reason about correctness rather than just generate correct text:

User: Ist der Satz grammatisch korrekt? "Der Frau ist schön."
Assistant: Nein, der Satz ist nicht korrekt.
Fehler: "Der" ist maskulin, aber "Frau" ist feminin.
Korrektur: "Die Frau ist schön."

Lessons from EWC/Fisher Information Research

From arXiv:2502.11756 on Fisher Information computation:

EXACT computation outperforms approximations
Minimum 500 samples for reliable Fisher estimation
Device consistency (GPU-only) is critical

These insights from our earlier SmolLM3 experiments (V5/V6 with Elastic Weight Consolidation) informed our dataset composition decisions for Qwen3-V4.

Training Details

Multi-Stage Finetuning: Lessons Learned

This project was inspired by the 3-stage finetuning approach described in MiroThinker (arXiv:2511.11793):

Stage 1: SFT (Supervised Fine-Tuning)
Stage 2: DPO (Direct Preference Optimization)
Stage 3: RL/GRPO (Reinforcement Learning with Grammar Rewards)

Our training script design followed the MiroThinker paper's architecture, including:

Higher LoRA rank (r=32, alpha=64) as recommended for multi-stage training stability
Full target modules (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) for comprehensive adaptation

However, this model uses only SFT because our experiments with preference optimization showed a fundamental trade-off:

Why DPO/SimPO Failed for Our Use Case

Model	CoLA MCC	GEC F1	Generation	Overall	Status
V6 SFT (Baseline)	0.624	0.191	3.29	0.516	Reference
V8 (DPO)	0.583	0.063	3.63	0.473	-8% overall
V9 (SimPO)	0.567	0.073	3.72	0.479	-7% overall

Key findings:

DPO improved generation quality (+10%) but destroyed GEC accuracy (-67%)
SimPO achieved best generation (3.72) but still regressed accuracy significantly
This confirms the "alignment tax" documented in 2025 research: 76% of preference optimization causes regression on specific tasks

For grammar teaching, accuracy is more important than fluency, so we stayed with pure SFT. The V4 dataset composition (25% CoLA-style judgment examples) proved more effective than preference optimization for our metrics.

SFT Training Configuration

Configuration based on MiroThinker recommendations for multi-stage training:

Base Model: Qwen/Qwen3-4B
LoRA Configuration:
  - Rank (r): 32
  - Alpha: 64
  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  - Dropout: 0

Training Configuration:
  - Epochs: 2
  - Learning Rate: 2e-4 (linear decay)
  - Batch Size: 2 (effective: 8 with gradient accumulation)
  - Warmup: 10%
  - Precision: BF16
  - Optimizer: AdamW 8-bit

The SFT stage trains on ~9,000 curated examples covering grammar judgment, error correction, and teaching explanations.

Deployment Pipeline

After SFT training, the model goes through:

1. PEFT Merge

LoRA adapters are merged into the base model:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "lora_adapters/")
model = model.merge_and_unload()
model.save_pretrained("merged_model/", safe_serialization=True)

2. GGUF Quantization

For efficient local inference:

python convert_hf_to_gguf.py merged_model/ --outtype bf16 --outfile model-bf16.gguf
llama-quantize model-bf16.gguf model-Q4_K_M.gguf Q4_K_M

The Q4_K_M quantization reduces model size from ~8GB to 2.5GB while maintaining high quality.

Technical Specifications

Framework: Unsloth + Transformers + TRL + PEFT
Hardware: NVIDIA RTX 4090 (24GB VRAM)
Training Time: ~45 minutes for 2 epochs
Quantization: GGUF Q4_K_M (2.5GB)

Citation

If you use this model, please cite:

@misc{qwen3-4b-german-teacher-2025,
  author = {Jaigouk Kim},
  title = {Qwen3-4B German Teacher: A Finetuned Model for German Grammar Teaching},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jaigouk/qwen3-4b-german-teacher}
}

License

This model is released under the Apache 2.0 license, following the base Qwen3 model license.

Acknowledgments

Qwen Team for the excellent base model
Unsloth for efficient finetuning tools

Downloads last month: 67

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for jaigouk/qwen3-4b-german-teacher-v1

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(168)

this model

Evaluation results

CoLA MCC (Grammaticality Judgment)
self-reported

0.721
GEC Macro F1 (Error Correction)
self-reported

0.349
Generation Quality (1-5 scale)
self-reported

3.990
Overall Score
self-reported

0.633

Metadata error: specify a dataset to view leaderboard