Qwen3-4B German Teacher

A finetuned Qwen3-4B model specialized for German language teaching at A1-B1 CEFR levels. This model excels at:

  • Grammar Error Detection: Identifying grammatical mistakes in German sentences
  • Error Correction: Providing correct forms with clear explanations
  • Grammar Judgment: Binary classification of sentence grammaticality (CoLA-style)
  • Teaching Explanations: Clear, learner-friendly explanations of German grammar rules

Model Details

Property Value
Base Model Qwen/Qwen3-4B
Parameters 4B
Training Method SFT (Supervised Fine-Tuning) with LoRA
LoRA Rank 32
LoRA Alpha 64
Training Epochs 2
Learning Rate 2e-4
Context Length 4096 tokens

Performance

Evaluated on a custom German grammar benchmark:

Metric Score Description
CoLA MCC 0.721 Matthews Correlation Coefficient for grammaticality judgment
GEC F1 0.349 Macro F1 for grammar error correction
Generation Quality 3.99/5.0 Human-evaluated response quality
Overall Score 0.633 Weighted composite score

Benchmark Comparison

Comparison with SmolLM3-German-V6 (3B parameters, Q4 quantized):

Metric Qwen3 German Teacher (4B) SmolLM3-German-V6 (3B) Improvement
CoLA MCC 0.721 0.624 +15.5%
GEC F1 0.349 0.145 +140.7%
Generation Quality 3.99 3.19 +25.1%
Overall Score 0.633 0.492 +28.7%

Key advantages of this model:

  • Superior grammaticality judgment: 15.5% higher CoLA MCC score
  • Much better error correction: 2.4x better GEC F1 score
  • Higher quality responses: 0.8 points higher on 5-point scale

Training Data

The model was trained on ~9,000 examples with the following composition:

Category Percentage Purpose
Grammar Correction 35% Error correction patterns
Grammar Judgment 25% CoLA-style "Is this correct?" examples
Structured Teaching 20% Verb conjugations, grammar explanations
General Conversation 20% Fluency preservation

Key Training Focus Areas

  • haben/sein auxiliary verbs: Movement verbs require "sein" (e.g., "Ich bin gefahren" not "Ich habe gefahren")
  • Article gender (der/die/das)
  • Case usage (Nominativ, Akkusativ, Dativ, Genitiv)
  • Word order in German sentences

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jaigouk/qwen3-4b-german-teacher")
tokenizer = AutoTokenizer.from_pretrained("jaigouk/qwen3-4b-german-teacher")

messages = [
    {"role": "system", "content": "Du bist ein freundlicher Deutschlehrer fรผr A1-B1 Lernende."},
    {"role": "user", "content": "Is this German sentence correct? 'Ich habe nach Berlin gefahren.'"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Ollama (GGUF)

  1. Download the GGUF file: qwen3-german-teacher-Q4_K_M.gguf

  2. Create a Modelfile:

FROM ./qwen3-german-teacher-Q4_K_M.gguf

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
<think>

</think>

{{ .Response }}<|im_end|>
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

SYSTEM """Du bist ein freundlicher Deutschlehrer fรผr A1-B1 Lernende. Erklรคre Grammatikfehler klar und einfach. Beurteile grammatische Korrektheit sorgfรคltig."""
  1. Create and run:
ollama create qwen3-german-teacher -f Modelfile
ollama run qwen3-german-teacher "Is 'Ich habe nach Berlin gefahren' correct?"

Example Interactions

Grammar Error Detection

User: Is this German sentence grammatically correct? "Ich habe nach Berlin gefahren."

Model: No, this sentence is incorrect. The verb "fahren" describes movement from one place to another, so it requires "sein" as the auxiliary verb, not "haben".

Correct: "Ich bin nach Berlin gefahren."

Grammar Explanation

User: Explain when to use "sein" vs "haben" in the Perfekt tense.

Model: In German, most verbs use "haben" in the Perfekt tense, but verbs that describe:

  1. Movement from A to B: gehen, fahren, fliegen, kommen, laufen
  2. Change of state: aufwachen, einschlafen, sterben, werden

These verbs use "sein" instead. For example:

  • Ich bin nach Hause gegangen. (movement)
  • Er ist eingeschlafen. (change of state)

Conjugation Help

User: Conjugate "sein" in present tense.

Model: Here is the conjugation of "sein" (to be) in present tense:

Person Conjugation
ich bin
du bist
er/sie/es ist
wir sind
ihr seid
sie/Sie sind

Limitations

  • Optimized for A1-B1 level German; may oversimplify advanced grammar concepts
  • Focused on grammar teaching; not designed for creative writing or translation
  • Best performance with clear, focused grammar questions

Training Methodology

Research-Informed Approach

This model was developed using insights from recent research on instruction-tuning and continual learning:

GRAPE Data Selection

Based on the GRAPE paper (Feb 2025):

"Instructions paired with responses sampled from other LLMs are often out of the distribution of the target model, which can lead to diminishing returns and even hurt performance"

Key insight: Rather than using responses generated by larger models (GPT-4, Claude), we curated training examples that align with Qwen3's natural output distribution. This prevents the model from learning patterns it cannot reliably reproduce.

CoLA-Style Judgment Training

The breakthrough in V4 was adding 25% explicit grammaticality judgment examples. This teaches the model to reason about correctness rather than just generate correct text:

User: Ist der Satz grammatisch korrekt? "Der Frau ist schรถn."
Assistant: Nein, der Satz ist nicht korrekt.
Fehler: "Der" ist maskulin, aber "Frau" ist feminin.
Korrektur: "Die Frau ist schรถn."

Lessons from EWC/Fisher Information Research

From arXiv:2502.11756 on Fisher Information computation:

  • EXACT computation outperforms approximations
  • Minimum 500 samples for reliable Fisher estimation
  • Device consistency (GPU-only) is critical

These insights from our earlier SmolLM3 experiments (V5/V6 with Elastic Weight Consolidation) informed our dataset composition decisions for Qwen3-V4.

Training Details

Multi-Stage Finetuning: Lessons Learned

This project was inspired by the 3-stage finetuning approach described in MiroThinker (arXiv:2511.11793):

  1. Stage 1: SFT (Supervised Fine-Tuning)
  2. Stage 2: DPO (Direct Preference Optimization)
  3. Stage 3: RL/GRPO (Reinforcement Learning with Grammar Rewards)

Our training script design followed the MiroThinker paper's architecture, including:

  • Higher LoRA rank (r=32, alpha=64) as recommended for multi-stage training stability
  • Full target modules (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) for comprehensive adaptation

However, this model uses only SFT because our experiments with preference optimization showed a fundamental trade-off:

Why DPO/SimPO Failed for Our Use Case

Model CoLA MCC GEC F1 Generation Overall Status
V6 SFT (Baseline) 0.624 0.191 3.29 0.516 Reference
V8 (DPO) 0.583 0.063 3.63 0.473 -8% overall
V9 (SimPO) 0.567 0.073 3.72 0.479 -7% overall

Key findings:

  • DPO improved generation quality (+10%) but destroyed GEC accuracy (-67%)
  • SimPO achieved best generation (3.72) but still regressed accuracy significantly
  • This confirms the "alignment tax" documented in 2025 research: 76% of preference optimization causes regression on specific tasks

For grammar teaching, accuracy is more important than fluency, so we stayed with pure SFT. The V4 dataset composition (25% CoLA-style judgment examples) proved more effective than preference optimization for our metrics.

SFT Training Configuration

Configuration based on MiroThinker recommendations for multi-stage training:

Base Model: Qwen/Qwen3-4B
LoRA Configuration:
  - Rank (r): 32
  - Alpha: 64
  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  - Dropout: 0

Training Configuration:
  - Epochs: 2
  - Learning Rate: 2e-4 (linear decay)
  - Batch Size: 2 (effective: 8 with gradient accumulation)
  - Warmup: 10%
  - Precision: BF16
  - Optimizer: AdamW 8-bit

The SFT stage trains on ~9,000 curated examples covering grammar judgment, error correction, and teaching explanations.

Deployment Pipeline

After SFT training, the model goes through:

1. PEFT Merge

LoRA adapters are merged into the base model:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "lora_adapters/")
model = model.merge_and_unload()
model.save_pretrained("merged_model/", safe_serialization=True)

2. GGUF Quantization

For efficient local inference:

python convert_hf_to_gguf.py merged_model/ --outtype bf16 --outfile model-bf16.gguf
llama-quantize model-bf16.gguf model-Q4_K_M.gguf Q4_K_M

The Q4_K_M quantization reduces model size from ~8GB to 2.5GB while maintaining high quality.

Technical Specifications

  • Framework: Unsloth + Transformers + TRL + PEFT
  • Hardware: NVIDIA RTX 4090 (24GB VRAM)
  • Training Time: ~45 minutes for 2 epochs
  • Quantization: GGUF Q4_K_M (2.5GB)

Citation

If you use this model, please cite:

@misc{qwen3-4b-german-teacher-2025,
  author = {Jaigouk Kim},
  title = {Qwen3-4B German Teacher: A Finetuned Model for German Grammar Teaching},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jaigouk/qwen3-4b-german-teacher}
}

License

This model is released under the Apache 2.0 license, following the base Qwen3 model license.

Acknowledgments

Downloads last month
67
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jaigouk/qwen3-4b-german-teacher-v1

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(168)
this model

Evaluation results