ViT-5: Vision Transformers for The Mid-2020s
Paper β’ 2602.08071 β’ Published β’ 1
Predicts the SD checkpoint, model family, and tags from a generated image.
Backbone: fine-tuned ViT-5.
RunPod handler: cjyu81/genglassrunpod
| File | Description |
|---|---|
model.safetensors |
Model weights (HF-standard format) |
config.json |
Architecture config + training metadata |
preprocessor_config.json |
Image preprocessing parameters |
label_maps.json |
Index β label maps for all three heads |
training_log.json |
Full run history + seen image IDs |
best.pth |
Legacy PyTorch checkpoint (backward compat) |
Flux, Illustrious, NoobAI, Other, Pony, SD 1.5, SDXL, ZImageTurbo| Run | Date | Skip Scrape | Fresh Train | New Images | Best Val Loss |
|---|---|---|---|---|---|
| 1 | 2026-04-04 | True | True | 831 | 2.3535 |
Important: Use RunPod Serverless β New Endpoint β Custom (Docker image), NOT "Custom Deployment" (HF model link).
{"input": {"image": "<base64-jpeg>"}}
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
import json, torch
state = load_file(hf_hub_download('Charlie81/genglasses', 'model.safetensors'))
cfg = json.load(open(hf_hub_download('Charlie81/genglasses', 'config.json')))
maps = json.load(open(hf_hub_download('Charlie81/genglasses', 'label_maps.json')))
# rebuild CivitClassifier, load_state_dict(state), then predict