YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
- DiffusionGemma Humanizer β SOTA Text Humanization
- Table of Contents
- Key Findings
- Architecture
- Installation
- Usage
- Training Pipeline
- Multi-Detector Scoring
- Signal 1: GPT-2 Perplexity (GPTZero-style)
- Signal 2: Burstiness (GPTZero-style)
- Signal 3: Fast-DetectGPT (Bao et al., 2023)
- Signal 4: Cross-Model Perplexity (Binoculars-style)
- Signal 5: Character Distribution (LD-Score, Narayanasamy 2026)
- Signal 6: Stylometric Ensemble (Pangram-style)
- Signal 7: Weighted Ensemble
- Results
- Research Background
- Repository Structure
- License
- Table of Contents
DiffusionGemma Humanizer β SOTA Text Humanization
Fine-tuning Google's DiffusionGemma 26B (MoE, 3.8B active, Apache 2.0) to humanize AI-generated text and evade multi-signal AI detectors.
Table of Contents
- Key Findings
- Architecture
- Installation
- Usage
- Training Pipeline
- Multi-Detector Scoring
- Results
- Research Background
- Repository Structure
- License
Key Findings
1. DiffusionGemma base model achieves ~0% AI detection
On Fast-DetectGPT + heuristic ensemble (7 signals: perplexity, burstiness, cross-model PPL, character distribution, stylometric), DiffusionGemma 26B generates text classified as 100% Human β confirming the hypothesis from TarΔ±m & Onan (2025): diffusion-generated text naturally resists autoregressive-trained detectors.
2. Manual LoRA bypasses PEFT incompatibility
PEFT does not support Gemma4ClippableLinear (DiffusionGemma's custom linear wrapper). We implemented Manual LoRA injection via forward hooks that target the underlying Linear4bit modules, bypassing PEFT entirely.
3. VRAM optimization strategy
DiffusionGemma 26B in 4-bit uses 50.8 GB on A100 80GB. Training requires:
- Last 2 layers only β injects LoRA into 30 modules (not 189 across all layers)
- Gradient checkpointing β trades compute for memory, recomputing activations during backward
- Loss only on masked positions β skips padding tokens for memory efficiency
- bf16 LoRA params β halves activation memory vs float32
4. Multi-detector ensemble scoring
| Signal | Source | AI Pattern | Human Pattern |
|---|---|---|---|
| Perplexity (GPT-2) | GPTZero-style | < 18 (too predictable) | > 25 (natural variation) |
| Burstiness | GPTZero-style | < 0.15 (uniform) | > 0.3 (varied) |
| Fast-DetectGPT | Bao et al. (2023) | > 0.55 (negative curvature) | < 0.45 (positive curvature) |
| Cross-model PPL (GPT-Neo) | Binoculars-style | < 15 (both models agree) | > 25 (models disagree) |
| Character Distribution | LD-Score (Narayanasamy, 2026) | Global baseline | Domain-specialized |
| Stylometric (6 sub-signals) | Pangram-style | Formulaic, passive-heavy | Natural, varied |
| Weighted Ensemble | StealthRL-inspired | > 0.5 = AI | < 0.4 = Human |
Architecture
DiffusionGemma 26B
- Total params: 25.2B | Active: 3.8B (MoE: 8/128 experts + 1 shared)
- Generation: Block-autoregressive discrete diffusion
- Canvas: 256 tokens, bidirectional attention
- Sampler: Entropy-Bounded Denoising (1-48 steps, temperature 0.8β0.4)
Manual LoRA Injection
Gemma4ClippableLinear
βββ linear: Linear4bit (torch.nn.Linear subclass)
βββ forward: W @ x (frozen, 4-bit, no grad)
βββ LoRA hook: A @ B @ x.detach() * scale (trainable, bf16)
βββ A: (in_features, rank=8), kaiming init
βββ B: (rank=8, out_features), zero init
Training Loop
for each batch (prompt + target response):
1. Forward: prompt β encoder β KV cache
decoder: canvas β bidirectional attention β logits
(gradient checkpointing: activations NOT stored)
2. Mask 30-70% of target tokens randomly
3. Compute loss ONLY on masked positions (memory efficient)
4. Add entropy regularization (encourage human-like uncertainty)
5. Backward: recompute activations via checkpoint
gradient only flows through LoRA params (detached hooks)
6. Update LoRA weights (AdamW, lr=2e-4)
Installation
Prerequisites
pip install modal
modal setup
modal secret create hf-secrets HF_TOKEN=hf_your_token
Clone & Deploy
git clone https://huggingface.co/simonlesaumon/diffusiongemma-humanizer
cd diffusiongemma-humanizer
bash run.sh
Usage
Basic: Humanize AI Text
from transformers import DiffusionGemmaForBlockDiffusion, AutoTokenizer, BitsAndBytesConfig
import torch
# Load 4-bit model
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4")
model = DiffusionGemmaForBlockDiffusion.from_pretrained(
"google/diffusiongemma-26B-A4B-it",
quantization_config=bnb, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/diffusiongemma-26B-A4B-it")
# Load fine-tuned LoRA weights
from peft import PeftModel # or manual LoRA loader
# (see lora/ folder for weights + config)
# Humanize
ai_text = "Your AI-generated text here..."
messages = [
{"role": "system", "content": "Rewrite to sound human-written."},
{"role": "user", "content": ai_text},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True,
add_generation_prompt=True, return_dict=True, return_tensors="pt").to(model.device)
ai_tokens = tokenizer(ai_text, max_length=256, truncation=True,
padding="max_length", return_tensors="pt")
output = model.generate(**inputs,
decoder_input_ids=ai_tokens["input_ids"].to(model.device),
max_new_tokens=512, max_denoising_steps=24, t_max=0.8, t_min=0.4)
humanized = tokenizer.decode(output.sequences[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True)
Training Pipeline
6-Step Process (runs on Modal A100 80GB)
| Step | Description | Time |
|---|---|---|
| 1. Load Models | DiffusionGemma 4-bit + GPT-2 + GPT-Neo detectors | ~5 min |
| 2. Baseline Evaluation | 7-signal detector ensemble on 5 prompts | ~30 sec |
| 3. Build Dataset | 10K+ synthetic pairs annotated with detector scores | ~10 min |
| 4. LoRA + Training | Manual LoRA (last 2 layers, 30 modules) + 5-20 epochs | ~10h |
| 5. Post-Training Eval | Compare ensemble scores before/after | ~30 sec |
| 6. Export to HF | LoRA weights (5 MB) + results + model card | ~10 sec |
Training Hyperparameters
| Param | Value | Rationale |
|---|---|---|
| LoRA rank | 8 | Balance expressiveness vs memory |
| LoRA alpha | 16 | Scaling factor alpha/r = 2 |
| Learning rate | 2e-4 | Standard for LoRA fine-tuning |
| Optimizer | AdamW (paged_adamw_8bit) | VRAM efficient |
| Epochs | 5-20 | Dataset-size dependent |
| Batch size | 1 | VRAM constraint |
| Gradient accumulation | 16 | Effective batch = 16 |
| Mask ratio | 30-70% random | Diffusion training objective |
| Entropy target | 2.5 | Human-like token uncertainty |
Run the Pipeline
# Quick run (5 epochs, small dataset)
bash run.sh
# Full training (20 epochs, 10K+ dataset)
# Set num_epochs=20 in modal_project/app.py, then:
modal run modal_project/app.py --hf-token=hf_xxx
Multi-Detector Scoring
The scoring system implements techniques from multiple papers:
Signal 1: GPT-2 Perplexity (GPTZero-style)
Measures how "surprising" each word is to GPT-2 Medium. AI text tends to be more predictable (lower perplexity).
Signal 2: Burstiness (GPTZero-style)
Coefficient of variation of per-sentence perplexity. Human text varies more in complexity.
Signal 3: Fast-DetectGPT (Bao et al., 2023)
Probability curvature analysis: AI text sits at local minima of the probability landscape.
Signal 4: Cross-Model Perplexity (Binoculars-style)
GPT-Neo 125M computed perplexity compared to GPT-2 Medium. When models disagree, text is likely human.
Signal 5: Character Distribution (LD-Score, Narayanasamy 2026)
AI text approximates global character patterns; human text shows domain specialization.
Signal 6: Stylometric Ensemble (Pangram-style)
6 sub-signals: sentence length Ο, hapax legomena ratio, transition marker rate, passive voice rate, formulaic phrase rate, word length Ο.
Signal 7: Weighted Ensemble
Calibrated weights combining all signals with higher confidence on stylometric (1.5x) and Fast-DetectGPT (1.0x).
Results
Baseline (untrained DiffusionGemma)
- 0/5 texts detected as AI by weighted ensemble
- Mean ensemble score: 0.350 (threshold: < 0.4 = Human)
Breaking Down Detection Signals
| Text Type | PPL | Burstiness | FDGPT | Stylometric | Ensemble |
|---|---|---|---|---|---|
| Remote work blog | 16-23 | 0.58-0.96 | 0.000 | 0.29-0.35 | 0.30-0.38 |
| Quantum computing | 14-20 | 0.57-0.70 | 0.000 | 0.23-0.33 | 0.30-0.41 |
| Email declining job | 7-9 | 0.48-0.91 | 0.001 | 0.27-0.33 | 0.44-0.56 |
| French Revolution | 16-18 | 0.53-0.74 | 0.000 | 0.25-0.25 | 0.29-0.50 |
| Headphones review | 14-22 | 0.37-1.25 | 0.000 | 0.22-0.25 | 0.33-0.47 |
Why DiffusionGemma Evades Detectors
- Different statistical pathway β block-autoregressive diffusion produces token distributions unlike standard AR models
- Bidirectional attention β considers full context when denoising, producing more natural text
- Iterative refinement β entropy-bounded denoising naturally introduces variation
- No left-to-right bias β avoids formulaic transition patterns common in AR text
Research Background
This project synthesizes findings from 30+ papers (see research/ folder):
- Sadasivan et al. (2023): Theoretical ceiling β perfect detectors impossible as LLMs improve
- TarΔ±m & Onan (2025): Diffusion text naturally resists AR-trained detectors
- Cheng et al. (2025): Adversarial Paraphrasing β 87.88% TPR reduction via detector-guided feedback
- Ranganath & Ramesh (2026): StealthRL β 99.9% attack success with multi-detector GRPO
- Pedrotti et al. (2025): DPO style-shifting β few-shot fine-tuning fools detectors
- Narayanasamy et al. (2026): LD-Score β character distribution separates human/AI text
- Xu et al. (2026): HIP pipeline β base models look human to detectors
Full literature review: research/technical-diffusion-text-humanization-2026-06-29.md
Repository Structure
diffusiongemma-humanizer/
βββ README.md # This file
βββ research_report.md # Gemma + diffusion models + Modal costs
βββ research_datasets_training.md # Training data survey
βββ commercial_ai_detectors_report.md # Pangram, GPTZero, Originality.ai analysis
βββ research/
β βββ architecture-strategy.md # Architecture decisions & cost breakdown
β βββ technical-diffusion-text-humanization-2026-06-29.md # Full lit review (30+ papers)
βββ modal_project/
β βββ app.py # Complete 6-step training pipeline
β βββ humanize_french.py # French text humanization (standalone)
β βββ upload_hf.py # HF upload utilities
βββ scripts/
β βββ run.py # Simple launcher
β βββ launch.py # Launcher with UTF-8 logging
β βββ run_pipeline.ps1 # PowerShell launcher
β βββ run_pipeline.bat # Batch launcher
βββ run.sh # Bash launcher (primary)
βββ run_french.py # French humanization launcher
βββ lora/ # Fine-tuned LoRA weights
β βββ lora_weights.pt # LoRA parameter state dict
β βββ lora_config.json # LoRA configuration
βββ baseline_detector_results.json # Pre-training evaluation
βββ post_training_eval.json # Post-training evaluation
βββ experiment_log.json # Full experiment config & results
License
Apache 2.0 β matching the base model google/diffusiongemma-26B-A4B-it.
Pipeline last run: 2026-06-30 | GPU: Modal A100 80GB | Framework: PyTorch 2.12 + Transformers 5.12