GoldNet / README.md
zobeir's picture
Update README: published DOI, final 224-res CV numbers, fix under-review status
996a0ce verified
|
Raw
History Blame Contribute Delete
4 kB
metadata
license: mit
datasets:
  - zobeir/GoldNet
tags:
  - image-classification
  - pytorch
  - vision-transformer
  - counterfeit-detection
  - gold
  - fine-grained-recognition
language:
  - en

GoldNet Model Weights

Trained checkpoints for GoldFormer and baseline models from the paper:

GoldFormer: A Texture-Aware Vision Transformer-Based Algorithm for Detecting Near-Identical Images
Z. Raisi, Algorithms (MDPI), 2026, 19(7), 530.
DOI: 10.3390/a19070530. Open access (CC BY 4.0).
Code & dataset: github.com/zobeirraisi/GoldNet

Task

Binary image classification β€” authentic vs. counterfeit gold items β€” from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset.

Available Checkpoints (weights/)

Results below are 5-fold stratified cross-validation at matched 224Γ—224 resolution (the paper's primary setting).

File Model Accuracy (%) F1
GoldFormer_best.pth GoldFormer (CNN + Swin-T + TAAG) 95.02 Β± 0.75 0.9502
ViT_B16_best.pth ViT-B/16 94.31 Β± 0.94 0.9431
Swin_T_best.pth Swin Transformer-Tiny (GoldFormer's backbone) 93.65 Β± 0.67 0.9365
ResNet101_best.pth ResNet-101 92.29 Β± 1.01 0.9228
ResNet50_best.pth ResNet-50 β€” β€”
ResNet18_best.pth ResNet-18 β€” β€”
DenseNet121_best.pth DenseNet-121 β€” β€”
EfficientNet_B3_best.pth EfficientNet-B3 β€” β€”
EfficientNet_B0_best.pth EfficientNet-B0 β€” β€”
MobileNet_V2_best.pth MobileNet-V2 β€” β€”

GoldFormer is the best single model and beats a soft-voting ensemble (94.92%); it is statistically tied with the strongest individual backbone, ViT-B/16 (paired McNemar p = 0.228), and significantly beats its own Swin-T backbone (p = 0.014), while using about half ViT-B/16's FLOPs (8.6 vs 16.9 GFLOPs) and fewer parameters (54.3M vs 86.6M).

All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit).

Usage

import torch
from models import build_model   # models.py from the GitHub repo

# Download weights
# bash fetch_weights.sh   (from the GitHub repo)

model = build_model("goldformer")
state = torch.load("weights/GoldFormer_best.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state)   # strict β€” exact match with the released checkpoint
model.eval()

from torchvision import transforms
from PIL import Image

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])

img = Image.open("your_image.jpg").convert("RGB")
x = transform(img).unsqueeze(0)

with torch.no_grad():
    logits, gamma = model(x)   # gamma = TAAG gate activations, for interpretability
    prob_authentic = torch.softmax(logits, dim=1)[0, 0].item()
    print(f"P(authentic) = {prob_authentic:.3f}")

Note: All checkpoints, including GoldFormer, use 224Γ—224 input in the published configuration. The models.py class definitions (TextureAwareAttentionGate + GoldFormer) are in the GitHub repo.

Citation

@article{raisi2026goldformer,
  title   = {GoldFormer: A Texture-Aware Vision Transformer-Based Algorithm
             for Detecting Near-Identical Images},
  author  = {Raisi, Zobeir},
  journal = {Algorithms},
  volume  = {19},
  number  = {7},
  pages   = {530},
  year    = {2026},
  doi     = {10.3390/a19070530}
}

License

Model weights: MIT License
Dataset: CC BY 4.0