pjajal commited on
Commit
5639aeb
·
verified ·
1 Parent(s): 9e466c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -4
README.md CHANGED
@@ -2,9 +2,85 @@
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
 
 
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: https://github.com/pjjajal/adaperceiver-public
9
- - Paper: https://arxiv.org/abs/2511.18105
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
5
+ - vision
6
+ - perceiver
7
+ - adaptive-computation
8
+ license: mit
9
+ datasets:
10
+ - timm/imagenet-12k-wds
11
  ---
12
 
13
+ # AdaPerceiver (Logit + Feature Distilled from ViT-H CLIP)
14
+
15
+ This repository hosts the **logit + feature distilled AdaPerceiver model**, introduced in
16
+ **“AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens”**.
17
+
18
+ 📄 Paper: https://arxiv.org/abs/2511.18105
19
+ 📦 Code: https://github.com/pjajal/AdaPerceiver
20
+ 📚 Model Collection: https://huggingface.co/collections/pjajal/adaperceiver-v1
21
+
22
+ This model is distilled from [ViT-H CLIP model](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k).
23
+
24
+ ---
25
+
26
+ ## Model Description
27
+
28
+ **AdaPerceiver** is a Perceiver-style transformer architecture designed for **runtime-adaptive computation**.
29
+ A single trained model can dynamically trade off **accuracy and compute** by adjusting:
30
+
31
+ - **Number of latent tokens** (token adaptivity)
32
+ - **Effective depth** via early exiting (depth adaptivity)
33
+ - **Embedding dimension** using Matryoshka (nested) feed-forward layers (width adaptivity)
34
+
35
+ This specific checkpoint corresponds to the **logit + feature distilled AdaPerceiver model**, trained on **ImageNet-12K** using a ViT-H teacher. It exposes both:
36
+ - classification **logits**, and
37
+ - **feature representations**
38
+
39
+ making it suitable for **downstream dense prediction tasks** such as semantic segmentation and depth estimation.
40
+
41
+ ---
42
+
43
+ ## Training Details
44
+
45
+ - **Training Data:** ImageNet-12K
46
+ - **Training Objective:** Logit distillation + feature distillation
47
+ - **Teacher Model:** [ViT-H/14 CLIP model](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k).
48
+ - **Architecture:** Adaptive Perceiver with block-masked attention and Matryoshka FFNs
49
+ - **Adaptivity Axes:** Tokens, Depth, Width
50
+
51
+ For full training details, see Appendix D of the paper.
52
+
53
+ ---
54
+
55
+ ## How to Use
56
+
57
+ This model can be loaded using the AdaPerceiver Hub-compatible class.
58
+
59
+ ```python
60
+ import torch
61
+ from hub.networks.adaperceiver_distill import DistillAdaPerceiver
62
+
63
+ model = DistillAdaPerceiver.from_pretrained("pjajal/adaperceiver-v1")
64
+
65
+ out = model(
66
+ torch.randn(1, 3, 224, 224),
67
+ num_tokens=256, # latent token count
68
+ mat_dim=128, # embedding width
69
+ depth=12, # early-exit depth
70
+ )
71
+
72
+ print(out.logits.shape, out.features.shape)
73
+ ```
74
+
75
+ ## Reference
76
+
77
+ If you use these models please cite the AdaPerceiver paper:
78
+
79
+ ```bibtex
80
+ @article{jajal2025adaperceiver,
81
+ title={AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens},
82
+ author={Jajal, Purvish and Eliopoulos, Nick John and Chou, Benjamin Shiue-Hal and Thiruvathukal, George K and Lu, Yung-Hsiang and Davis, James C},
83
+ journal={arXiv preprint arXiv:2511.18105},
84
+ year={2025}
85
+ }
86
+ ```