YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

DiffusionGemma Humanizer — SOTA Text Humanization

Fine-tuning Google's DiffusionGemma 26B (MoE, 3.8B active, Apache 2.0) to humanize AI-generated text and evade multi-signal AI detectors.

Key Findings
Architecture
Installation
Usage
Training Pipeline
Multi-Detector Scoring
Results
Research Background
Repository Structure
License

Key Findings

1. DiffusionGemma base model achieves ~0% AI detection

On Fast-DetectGPT + heuristic ensemble (7 signals: perplexity, burstiness, cross-model PPL, character distribution, stylometric), DiffusionGemma 26B generates text classified as 100% Human — confirming the hypothesis from Tarım & Onan (2025): diffusion-generated text naturally resists autoregressive-trained detectors.

2. Manual LoRA bypasses PEFT incompatibility

PEFT does not support Gemma4ClippableLinear (DiffusionGemma's custom linear wrapper). We implemented Manual LoRA injection via forward hooks that target the underlying Linear4bit modules, bypassing PEFT entirely.

3. VRAM optimization strategy

DiffusionGemma 26B in 4-bit uses 50.8 GB on A100 80GB. Training requires:

Last 2 layers only — injects LoRA into 30 modules (not 189 across all layers)
Gradient checkpointing — trades compute for memory, recomputing activations during backward
Loss only on masked positions — skips padding tokens for memory efficiency
bf16 LoRA params — halves activation memory vs float32

4. Multi-detector ensemble scoring

Signal	Source	AI Pattern	Human Pattern
Perplexity (GPT-2)	GPTZero-style	< 18 (too predictable)	> 25 (natural variation)
Burstiness	GPTZero-style	< 0.15 (uniform)	> 0.3 (varied)
Fast-DetectGPT	Bao et al. (2023)	> 0.55 (negative curvature)	< 0.45 (positive curvature)
Cross-model PPL (GPT-Neo)	Binoculars-style	< 15 (both models agree)	> 25 (models disagree)
Character Distribution	LD-Score (Narayanasamy, 2026)	Global baseline	Domain-specialized
Stylometric (6 sub-signals)	Pangram-style	Formulaic, passive-heavy	Natural, varied
Weighted Ensemble	StealthRL-inspired	> 0.5 = AI	< 0.4 = Human

Architecture

DiffusionGemma 26B

Total params: 25.2B | Active: 3.8B (MoE: 8/128 experts + 1 shared)
Generation: Block-autoregressive discrete diffusion
Canvas: 256 tokens, bidirectional attention
Sampler: Entropy-Bounded Denoising (1-48 steps, temperature 0.8→0.4)

Manual LoRA Injection

Gemma4ClippableLinear
  └── linear: Linear4bit (torch.nn.Linear subclass)
       ├── forward: W @ x  (frozen, 4-bit, no grad)
       └── LoRA hook: A @ B @ x.detach() * scale  (trainable, bf16)
            ├── A: (in_features, rank=8), kaiming init
            └── B: (rank=8, out_features), zero init

Training Loop

for each batch (prompt + target response):
    1. Forward: prompt → encoder → KV cache
       decoder: canvas → bidirectional attention → logits
       (gradient checkpointing: activations NOT stored)
    2. Mask 30-70% of target tokens randomly
    3. Compute loss ONLY on masked positions (memory efficient)
    4. Add entropy regularization (encourage human-like uncertainty)
    5. Backward: recompute activations via checkpoint
       gradient only flows through LoRA params (detached hooks)
    6. Update LoRA weights (AdamW, lr=2e-4)

Installation

Prerequisites

pip install modal
modal setup
modal secret create hf-secrets HF_TOKEN=hf_your_token

Clone & Deploy

git clone https://huggingface.co/simonlesaumon/diffusiongemma-humanizer
cd diffusiongemma-humanizer
bash run.sh

Usage

Basic: Humanize AI Text

from transformers import DiffusionGemmaForBlockDiffusion, AutoTokenizer, BitsAndBytesConfig
import torch

# Load 4-bit model
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,
                         bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4")
model = DiffusionGemmaForBlockDiffusion.from_pretrained(
    "google/diffusiongemma-26B-A4B-it",
    quantization_config=bnb, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/diffusiongemma-26B-A4B-it")

# Load fine-tuned LoRA weights
from peft import PeftModel  # or manual LoRA loader
# (see lora/ folder for weights + config)

# Humanize
ai_text = "Your AI-generated text here..."
messages = [
    {"role": "system", "content": "Rewrite to sound human-written."},
    {"role": "user", "content": ai_text},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True,
    add_generation_prompt=True, return_dict=True, return_tensors="pt").to(model.device)

ai_tokens = tokenizer(ai_text, max_length=256, truncation=True,
                      padding="max_length", return_tensors="pt")
output = model.generate(**inputs,
    decoder_input_ids=ai_tokens["input_ids"].to(model.device),
    max_new_tokens=512, max_denoising_steps=24, t_max=0.8, t_min=0.4)
humanized = tokenizer.decode(output.sequences[0][inputs["input_ids"].shape[-1]:],
                              skip_special_tokens=True)

Training Pipeline

6-Step Process (runs on Modal A100 80GB)

Step	Description	Time
1. Load Models	DiffusionGemma 4-bit + GPT-2 + GPT-Neo detectors	~5 min
2. Baseline Evaluation	7-signal detector ensemble on 5 prompts	~30 sec
3. Build Dataset	10K+ synthetic pairs annotated with detector scores	~10 min
4. LoRA + Training	Manual LoRA (last 2 layers, 30 modules) + 5-20 epochs	~10h
5. Post-Training Eval	Compare ensemble scores before/after	~30 sec
6. Export to HF	LoRA weights (5 MB) + results + model card	~10 sec

Training Hyperparameters

Param	Value	Rationale
LoRA rank	8	Balance expressiveness vs memory
LoRA alpha	16	Scaling factor alpha/r = 2
Learning rate	2e-4	Standard for LoRA fine-tuning
Optimizer	AdamW (paged_adamw_8bit)	VRAM efficient
Epochs	5-20	Dataset-size dependent
Batch size	1	VRAM constraint
Gradient accumulation	16	Effective batch = 16
Mask ratio	30-70% random	Diffusion training objective
Entropy target	2.5	Human-like token uncertainty

Run the Pipeline

# Quick run (5 epochs, small dataset)
bash run.sh

# Full training (20 epochs, 10K+ dataset)
# Set num_epochs=20 in modal_project/app.py, then:
modal run modal_project/app.py --hf-token=hf_xxx

Multi-Detector Scoring

The scoring system implements techniques from multiple papers:

Signal 1: GPT-2 Perplexity (GPTZero-style)

Measures how "surprising" each word is to GPT-2 Medium. AI text tends to be more predictable (lower perplexity).

Signal 2: Burstiness (GPTZero-style)

Coefficient of variation of per-sentence perplexity. Human text varies more in complexity.

Signal 3: Fast-DetectGPT (Bao et al., 2023)

Probability curvature analysis: AI text sits at local minima of the probability landscape.

Signal 4: Cross-Model Perplexity (Binoculars-style)

GPT-Neo 125M computed perplexity compared to GPT-2 Medium. When models disagree, text is likely human.

Signal 5: Character Distribution (LD-Score, Narayanasamy 2026)

AI text approximates global character patterns; human text shows domain specialization.

Signal 6: Stylometric Ensemble (Pangram-style)

6 sub-signals: sentence length σ, hapax legomena ratio, transition marker rate, passive voice rate, formulaic phrase rate, word length σ.

Signal 7: Weighted Ensemble

Calibrated weights combining all signals with higher confidence on stylometric (1.5x) and Fast-DetectGPT (1.0x).

Results

Baseline (untrained DiffusionGemma)

0/5 texts detected as AI by weighted ensemble
Mean ensemble score: 0.350 (threshold: < 0.4 = Human)

Breaking Down Detection Signals

Text Type	PPL	Burstiness	FDGPT	Stylometric	Ensemble
Remote work blog	16-23	0.58-0.96	0.000	0.29-0.35	0.30-0.38
Quantum computing	14-20	0.57-0.70	0.000	0.23-0.33	0.30-0.41
Email declining job	7-9	0.48-0.91	0.001	0.27-0.33	0.44-0.56
French Revolution	16-18	0.53-0.74	0.000	0.25-0.25	0.29-0.50
Headphones review	14-22	0.37-1.25	0.000	0.22-0.25	0.33-0.47

Why DiffusionGemma Evades Detectors

Different statistical pathway — block-autoregressive diffusion produces token distributions unlike standard AR models
Bidirectional attention — considers full context when denoising, producing more natural text
Iterative refinement — entropy-bounded denoising naturally introduces variation
No left-to-right bias — avoids formulaic transition patterns common in AR text

Research Background

This project synthesizes findings from 30+ papers (see research/ folder):

Sadasivan et al. (2023): Theoretical ceiling — perfect detectors impossible as LLMs improve
Tarım & Onan (2025): Diffusion text naturally resists AR-trained detectors
Cheng et al. (2025): Adversarial Paraphrasing — 87.88% TPR reduction via detector-guided feedback
Ranganath & Ramesh (2026): StealthRL — 99.9% attack success with multi-detector GRPO
Pedrotti et al. (2025): DPO style-shifting — few-shot fine-tuning fools detectors
Narayanasamy et al. (2026): LD-Score — character distribution separates human/AI text
Xu et al. (2026): HIP pipeline — base models look human to detectors

Full literature review: research/technical-diffusion-text-humanization-2026-06-29.md

Repository Structure

diffusiongemma-humanizer/
├── README.md                                    # This file
├── research_report.md                           # Gemma + diffusion models + Modal costs
├── research_datasets_training.md                # Training data survey
├── commercial_ai_detectors_report.md            # Pangram, GPTZero, Originality.ai analysis
├── research/
│   ├── architecture-strategy.md                 # Architecture decisions & cost breakdown
│   └── technical-diffusion-text-humanization-2026-06-29.md  # Full lit review (30+ papers)
├── modal_project/
│   ├── app.py                                   # Complete 6-step training pipeline
│   ├── humanize_french.py                       # French text humanization (standalone)
│   └── upload_hf.py                             # HF upload utilities
├── scripts/
│   ├── run.py                                   # Simple launcher
│   ├── launch.py                                # Launcher with UTF-8 logging
│   ├── run_pipeline.ps1                         # PowerShell launcher
│   └── run_pipeline.bat                         # Batch launcher
├── run.sh                                       # Bash launcher (primary)
├── run_french.py                                # French humanization launcher
├── lora/                                        # Fine-tuned LoRA weights
│   ├── lora_weights.pt                          # LoRA parameter state dict
│   └── lora_config.json                         # LoRA configuration
├── baseline_detector_results.json               # Pre-training evaluation
├── post_training_eval.json                      # Post-training evaluation
└── experiment_log.json                          # Full experiment config & results

License

Apache 2.0 — matching the base model google/diffusiongemma-26B-A4B-it.

Pipeline last run: 2026-06-30 | GPU: Modal A100 80GB | Framework: PyTorch 2.12 + Transformers 5.12

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

simonlesaumon
/

diffusiongemma-humanizer