license: mit
datasets:
- zobeir/GoldNet
tags:
- image-classification
- pytorch
- vision-transformer
- counterfeit-detection
- gold
- fine-grained-recognition
language:
- en
GoldNet Model Weights
Trained checkpoints for GoldFormer and baseline models from the paper:
GoldFormer: A Texture-Aware Vision Transformer-Based Algorithm for Detecting Near-Identical Images
Z. Raisi, Algorithms (MDPI), 2026, 19(7), 530.
DOI: 10.3390/a19070530. Open access (CC BY 4.0).
Code & dataset: github.com/zobeirraisi/GoldNet
Task
Binary image classification β authentic vs. counterfeit gold items β from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset.
Available Checkpoints (weights/)
Results below are 5-fold stratified cross-validation at matched 224Γ224 resolution (the paper's primary setting).
| File | Model | Accuracy (%) | F1 |
|---|---|---|---|
GoldFormer_best.pth |
GoldFormer (CNN + Swin-T + TAAG) | 95.02 Β± 0.75 | 0.9502 |
ViT_B16_best.pth |
ViT-B/16 | 94.31 Β± 0.94 | 0.9431 |
Swin_T_best.pth |
Swin Transformer-Tiny (GoldFormer's backbone) | 93.65 Β± 0.67 | 0.9365 |
ResNet101_best.pth |
ResNet-101 | 92.29 Β± 1.01 | 0.9228 |
ResNet50_best.pth |
ResNet-50 | β | β |
ResNet18_best.pth |
ResNet-18 | β | β |
DenseNet121_best.pth |
DenseNet-121 | β | β |
EfficientNet_B3_best.pth |
EfficientNet-B3 | β | β |
EfficientNet_B0_best.pth |
EfficientNet-B0 | β | β |
MobileNet_V2_best.pth |
MobileNet-V2 | β | β |
GoldFormer is the best single model and beats a soft-voting ensemble (94.92%); it is statistically tied with the strongest individual backbone, ViT-B/16 (paired McNemar p = 0.228), and significantly beats its own Swin-T backbone (p = 0.014), while using about half ViT-B/16's FLOPs (8.6 vs 16.9 GFLOPs) and fewer parameters (54.3M vs 86.6M).
All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit).
Usage
import torch
from models import build_model # models.py from the GitHub repo
# Download weights
# bash fetch_weights.sh (from the GitHub repo)
model = build_model("goldformer")
state = torch.load("weights/GoldFormer_best.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state) # strict β exact match with the released checkpoint
model.eval()
from torchvision import transforms
from PIL import Image
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225]),
])
img = Image.open("your_image.jpg").convert("RGB")
x = transform(img).unsqueeze(0)
with torch.no_grad():
logits, gamma = model(x) # gamma = TAAG gate activations, for interpretability
prob_authentic = torch.softmax(logits, dim=1)[0, 0].item()
print(f"P(authentic) = {prob_authentic:.3f}")
Note: All checkpoints, including GoldFormer, use 224Γ224 input in the published configuration. The
models.pyclass definitions (TextureAwareAttentionGate+GoldFormer) are in the GitHub repo.
Citation
@article{raisi2026goldformer,
title = {GoldFormer: A Texture-Aware Vision Transformer-Based Algorithm
for Detecting Near-Identical Images},
author = {Raisi, Zobeir},
journal = {Algorithms},
volume = {19},
number = {7},
pages = {530},
year = {2026},
doi = {10.3390/a19070530}
}
License
Model weights: MIT License
Dataset: CC BY 4.0