GoldNet / README.md

Update README: published DOI, final 224-res CV numbers, fix under-review status

996a0ce verified about 13 hours ago

4 kB

license: mit
datasets:
  - zobeir/GoldNet
tags:
  - image-classification
  - pytorch
  - vision-transformer
  - counterfeit-detection
  - gold
  - fine-grained-recognition
language:
  - en

GoldNet Model Weights

Trained checkpoints for GoldFormer and baseline models from the paper:

GoldFormer: A Texture-Aware Vision Transformer-Based Algorithm for Detecting Near-Identical Images
Z. Raisi, Algorithms (MDPI), 2026, 19(7), 530.
DOI: 10.3390/a19070530. Open access (CC BY 4.0).
Code & dataset: github.com/zobeirraisi/GoldNet

Task

Binary image classification — authentic vs. counterfeit gold items — from ordinary smartphone photographs. The two classes are near-identical to the eye; trained experts reached 89.80% accuracy on a blind subset.

Available Checkpoints (`weights/`)

Results below are 5-fold stratified cross-validation at matched 224×224 resolution (the paper's primary setting).

File	Model	Accuracy (%)	F1
`GoldFormer_best.pth`	GoldFormer (CNN + Swin-T + TAAG)	95.02 ± 0.75	0.9502
`ViT_B16_best.pth`	ViT-B/16	94.31 ± 0.94	0.9431
`Swin_T_best.pth`	Swin Transformer-Tiny (GoldFormer's backbone)	93.65 ± 0.67	0.9365
`ResNet101_best.pth`	ResNet-101	92.29 ± 1.01	0.9228
`ResNet50_best.pth`	ResNet-50	—	—
`ResNet18_best.pth`	ResNet-18	—	—
`DenseNet121_best.pth`	DenseNet-121	—	—
`EfficientNet_B3_best.pth`	EfficientNet-B3	—	—
`EfficientNet_B0_best.pth`	EfficientNet-B0	—	—
`MobileNet_V2_best.pth`	MobileNet-V2	—	—

GoldFormer is the best single model and beats a soft-voting ensemble (94.92%); it is statistically tied with the strongest individual backbone, ViT-B/16 (paired McNemar p = 0.228), and significantly beats its own Swin-T backbone (p = 0.014), while using about half ViT-B/16's FLOPs (8.6 vs 16.9 GFLOPs) and fewer parameters (54.3M vs 86.6M).

All models trained with 5-fold stratified cross-validation, AdamW, AMP (bfloat16), freeze-then-unfreeze fine-tuning on the GoldNet dataset (2,127 images, 1,044 authentic / 1,083 counterfeit).

Usage

import torch
from models import build_model   # models.py from the GitHub repo

# Download weights
# bash fetch_weights.sh   (from the GitHub repo)

model = build_model("goldformer")
state = torch.load("weights/GoldFormer_best.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state)   # strict — exact match with the released checkpoint
model.eval()

from torchvision import transforms
from PIL import Image

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])

img = Image.open("your_image.jpg").convert("RGB")
x = transform(img).unsqueeze(0)

with torch.no_grad():
    logits, gamma = model(x)   # gamma = TAAG gate activations, for interpretability
    prob_authentic = torch.softmax(logits, dim=1)[0, 0].item()
    print(f"P(authentic) = {prob_authentic:.3f}")

Note: All checkpoints, including GoldFormer, use 224×224 input in the published configuration. The models.py class definitions (TextureAwareAttentionGate + GoldFormer) are in the GitHub repo.

Citation

@article{raisi2026goldformer,
  title   = {GoldFormer: A Texture-Aware Vision Transformer-Based Algorithm
             for Detecting Near-Identical Images},
  author  = {Raisi, Zobeir},
  journal = {Algorithms},
  volume  = {19},
  number  = {7},
  pages   = {530},
  year    = {2026},
  doi     = {10.3390/a19070530}
}

License

Model weights: MIT License
Dataset: CC BY 4.0