YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

DiffusionGemma Humanizer β€” SOTA Text Humanization

Fine-tuning Google's DiffusionGemma 26B (MoE, 3.8B active, Apache 2.0) to humanize AI-generated text and evade multi-signal AI detectors.

HF Repo License GPU


Table of Contents

  1. Key Findings
  2. Architecture
  3. Installation
  4. Usage
  5. Training Pipeline
  6. Multi-Detector Scoring
  7. Results
  8. Research Background
  9. Repository Structure
  10. License

Key Findings

1. DiffusionGemma base model achieves ~0% AI detection

On Fast-DetectGPT + heuristic ensemble (7 signals: perplexity, burstiness, cross-model PPL, character distribution, stylometric), DiffusionGemma 26B generates text classified as 100% Human β€” confirming the hypothesis from TarΔ±m & Onan (2025): diffusion-generated text naturally resists autoregressive-trained detectors.

2. Manual LoRA bypasses PEFT incompatibility

PEFT does not support Gemma4ClippableLinear (DiffusionGemma's custom linear wrapper). We implemented Manual LoRA injection via forward hooks that target the underlying Linear4bit modules, bypassing PEFT entirely.

3. VRAM optimization strategy

DiffusionGemma 26B in 4-bit uses 50.8 GB on A100 80GB. Training requires:

  • Last 2 layers only β€” injects LoRA into 30 modules (not 189 across all layers)
  • Gradient checkpointing β€” trades compute for memory, recomputing activations during backward
  • Loss only on masked positions β€” skips padding tokens for memory efficiency
  • bf16 LoRA params β€” halves activation memory vs float32

4. Multi-detector ensemble scoring

Signal Source AI Pattern Human Pattern
Perplexity (GPT-2) GPTZero-style < 18 (too predictable) > 25 (natural variation)
Burstiness GPTZero-style < 0.15 (uniform) > 0.3 (varied)
Fast-DetectGPT Bao et al. (2023) > 0.55 (negative curvature) < 0.45 (positive curvature)
Cross-model PPL (GPT-Neo) Binoculars-style < 15 (both models agree) > 25 (models disagree)
Character Distribution LD-Score (Narayanasamy, 2026) Global baseline Domain-specialized
Stylometric (6 sub-signals) Pangram-style Formulaic, passive-heavy Natural, varied
Weighted Ensemble StealthRL-inspired > 0.5 = AI < 0.4 = Human

Architecture

DiffusionGemma 26B

  • Total params: 25.2B | Active: 3.8B (MoE: 8/128 experts + 1 shared)
  • Generation: Block-autoregressive discrete diffusion
  • Canvas: 256 tokens, bidirectional attention
  • Sampler: Entropy-Bounded Denoising (1-48 steps, temperature 0.8β†’0.4)

Manual LoRA Injection

Gemma4ClippableLinear
  └── linear: Linear4bit (torch.nn.Linear subclass)
       β”œβ”€β”€ forward: W @ x  (frozen, 4-bit, no grad)
       └── LoRA hook: A @ B @ x.detach() * scale  (trainable, bf16)
            β”œβ”€β”€ A: (in_features, rank=8), kaiming init
            └── B: (rank=8, out_features), zero init

Training Loop

for each batch (prompt + target response):
    1. Forward: prompt β†’ encoder β†’ KV cache
       decoder: canvas β†’ bidirectional attention β†’ logits
       (gradient checkpointing: activations NOT stored)
    2. Mask 30-70% of target tokens randomly
    3. Compute loss ONLY on masked positions (memory efficient)
    4. Add entropy regularization (encourage human-like uncertainty)
    5. Backward: recompute activations via checkpoint
       gradient only flows through LoRA params (detached hooks)
    6. Update LoRA weights (AdamW, lr=2e-4)

Installation

Prerequisites

pip install modal
modal setup
modal secret create hf-secrets HF_TOKEN=hf_your_token

Clone & Deploy

git clone https://huggingface.co/simonlesaumon/diffusiongemma-humanizer
cd diffusiongemma-humanizer
bash run.sh

Usage

Basic: Humanize AI Text

from transformers import DiffusionGemmaForBlockDiffusion, AutoTokenizer, BitsAndBytesConfig
import torch

# Load 4-bit model
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,
                         bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4")
model = DiffusionGemmaForBlockDiffusion.from_pretrained(
    "google/diffusiongemma-26B-A4B-it",
    quantization_config=bnb, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/diffusiongemma-26B-A4B-it")

# Load fine-tuned LoRA weights
from peft import PeftModel  # or manual LoRA loader
# (see lora/ folder for weights + config)

# Humanize
ai_text = "Your AI-generated text here..."
messages = [
    {"role": "system", "content": "Rewrite to sound human-written."},
    {"role": "user", "content": ai_text},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True,
    add_generation_prompt=True, return_dict=True, return_tensors="pt").to(model.device)

ai_tokens = tokenizer(ai_text, max_length=256, truncation=True,
                      padding="max_length", return_tensors="pt")
output = model.generate(**inputs,
    decoder_input_ids=ai_tokens["input_ids"].to(model.device),
    max_new_tokens=512, max_denoising_steps=24, t_max=0.8, t_min=0.4)
humanized = tokenizer.decode(output.sequences[0][inputs["input_ids"].shape[-1]:],
                              skip_special_tokens=True)

Training Pipeline

6-Step Process (runs on Modal A100 80GB)

Step Description Time
1. Load Models DiffusionGemma 4-bit + GPT-2 + GPT-Neo detectors ~5 min
2. Baseline Evaluation 7-signal detector ensemble on 5 prompts ~30 sec
3. Build Dataset 10K+ synthetic pairs annotated with detector scores ~10 min
4. LoRA + Training Manual LoRA (last 2 layers, 30 modules) + 5-20 epochs ~10h
5. Post-Training Eval Compare ensemble scores before/after ~30 sec
6. Export to HF LoRA weights (5 MB) + results + model card ~10 sec

Training Hyperparameters

Param Value Rationale
LoRA rank 8 Balance expressiveness vs memory
LoRA alpha 16 Scaling factor alpha/r = 2
Learning rate 2e-4 Standard for LoRA fine-tuning
Optimizer AdamW (paged_adamw_8bit) VRAM efficient
Epochs 5-20 Dataset-size dependent
Batch size 1 VRAM constraint
Gradient accumulation 16 Effective batch = 16
Mask ratio 30-70% random Diffusion training objective
Entropy target 2.5 Human-like token uncertainty

Run the Pipeline

# Quick run (5 epochs, small dataset)
bash run.sh

# Full training (20 epochs, 10K+ dataset)
# Set num_epochs=20 in modal_project/app.py, then:
modal run modal_project/app.py --hf-token=hf_xxx

Multi-Detector Scoring

The scoring system implements techniques from multiple papers:

Signal 1: GPT-2 Perplexity (GPTZero-style)

Measures how "surprising" each word is to GPT-2 Medium. AI text tends to be more predictable (lower perplexity).

Signal 2: Burstiness (GPTZero-style)

Coefficient of variation of per-sentence perplexity. Human text varies more in complexity.

Signal 3: Fast-DetectGPT (Bao et al., 2023)

Probability curvature analysis: AI text sits at local minima of the probability landscape.

Signal 4: Cross-Model Perplexity (Binoculars-style)

GPT-Neo 125M computed perplexity compared to GPT-2 Medium. When models disagree, text is likely human.

Signal 5: Character Distribution (LD-Score, Narayanasamy 2026)

AI text approximates global character patterns; human text shows domain specialization.

Signal 6: Stylometric Ensemble (Pangram-style)

6 sub-signals: sentence length Οƒ, hapax legomena ratio, transition marker rate, passive voice rate, formulaic phrase rate, word length Οƒ.

Signal 7: Weighted Ensemble

Calibrated weights combining all signals with higher confidence on stylometric (1.5x) and Fast-DetectGPT (1.0x).


Results

Baseline (untrained DiffusionGemma)

  • 0/5 texts detected as AI by weighted ensemble
  • Mean ensemble score: 0.350 (threshold: < 0.4 = Human)

Breaking Down Detection Signals

Text Type PPL Burstiness FDGPT Stylometric Ensemble
Remote work blog 16-23 0.58-0.96 0.000 0.29-0.35 0.30-0.38
Quantum computing 14-20 0.57-0.70 0.000 0.23-0.33 0.30-0.41
Email declining job 7-9 0.48-0.91 0.001 0.27-0.33 0.44-0.56
French Revolution 16-18 0.53-0.74 0.000 0.25-0.25 0.29-0.50
Headphones review 14-22 0.37-1.25 0.000 0.22-0.25 0.33-0.47

Why DiffusionGemma Evades Detectors

  1. Different statistical pathway β€” block-autoregressive diffusion produces token distributions unlike standard AR models
  2. Bidirectional attention β€” considers full context when denoising, producing more natural text
  3. Iterative refinement β€” entropy-bounded denoising naturally introduces variation
  4. No left-to-right bias β€” avoids formulaic transition patterns common in AR text

Research Background

This project synthesizes findings from 30+ papers (see research/ folder):

  • Sadasivan et al. (2023): Theoretical ceiling β€” perfect detectors impossible as LLMs improve
  • TarΔ±m & Onan (2025): Diffusion text naturally resists AR-trained detectors
  • Cheng et al. (2025): Adversarial Paraphrasing β€” 87.88% TPR reduction via detector-guided feedback
  • Ranganath & Ramesh (2026): StealthRL β€” 99.9% attack success with multi-detector GRPO
  • Pedrotti et al. (2025): DPO style-shifting β€” few-shot fine-tuning fools detectors
  • Narayanasamy et al. (2026): LD-Score β€” character distribution separates human/AI text
  • Xu et al. (2026): HIP pipeline β€” base models look human to detectors

Full literature review: research/technical-diffusion-text-humanization-2026-06-29.md


Repository Structure

diffusiongemma-humanizer/
β”œβ”€β”€ README.md                                    # This file
β”œβ”€β”€ research_report.md                           # Gemma + diffusion models + Modal costs
β”œβ”€β”€ research_datasets_training.md                # Training data survey
β”œβ”€β”€ commercial_ai_detectors_report.md            # Pangram, GPTZero, Originality.ai analysis
β”œβ”€β”€ research/
β”‚   β”œβ”€β”€ architecture-strategy.md                 # Architecture decisions & cost breakdown
β”‚   └── technical-diffusion-text-humanization-2026-06-29.md  # Full lit review (30+ papers)
β”œβ”€β”€ modal_project/
β”‚   β”œβ”€β”€ app.py                                   # Complete 6-step training pipeline
β”‚   β”œβ”€β”€ humanize_french.py                       # French text humanization (standalone)
β”‚   └── upload_hf.py                             # HF upload utilities
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ run.py                                   # Simple launcher
β”‚   β”œβ”€β”€ launch.py                                # Launcher with UTF-8 logging
β”‚   β”œβ”€β”€ run_pipeline.ps1                         # PowerShell launcher
β”‚   └── run_pipeline.bat                         # Batch launcher
β”œβ”€β”€ run.sh                                       # Bash launcher (primary)
β”œβ”€β”€ run_french.py                                # French humanization launcher
β”œβ”€β”€ lora/                                        # Fine-tuned LoRA weights
β”‚   β”œβ”€β”€ lora_weights.pt                          # LoRA parameter state dict
β”‚   └── lora_config.json                         # LoRA configuration
β”œβ”€β”€ baseline_detector_results.json               # Pre-training evaluation
β”œβ”€β”€ post_training_eval.json                      # Post-training evaluation
└── experiment_log.json                          # Full experiment config & results

License

Apache 2.0 β€” matching the base model google/diffusiongemma-26B-A4B-it.


Pipeline last run: 2026-06-30 | GPU: Modal A100 80GB | Framework: PyTorch 2.12 + Transformers 5.12

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support