pjajal
/

adaperceiver-v1

@@ -2,9 +2,85 @@
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: https://github.com/pjjajal/adaperceiver-public
-- Paper: https://arxiv.org/abs/2511.18105
-- Docs: [More Information Needed]

 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
+- vision
+- perceiver
+- adaptive-computation
+license: mit
+datasets:
+- timm/imagenet-12k-wds
 ---
+# AdaPerceiver (Logit + Feature Distilled from ViT-H CLIP)
+This repository hosts the **logit + feature distilled AdaPerceiver model**, introduced in
+**“AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens”**.
+📄 Paper: https://arxiv.org/abs/2511.18105
+📦 Code: https://github.com/pjajal/AdaPerceiver
+📚 Model Collection: https://huggingface.co/collections/pjajal/adaperceiver-v1
+This model is distilled from [ViT-H CLIP model](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k).
+---
+## Model Description
+**AdaPerceiver** is a Perceiver-style transformer architecture designed for **runtime-adaptive computation**.
+A single trained model can dynamically trade off **accuracy and compute** by adjusting:
+- **Number of latent tokens** (token adaptivity)
+- **Effective depth** via early exiting (depth adaptivity)
+- **Embedding dimension** using Matryoshka (nested) feed-forward layers (width adaptivity)
+This specific checkpoint corresponds to the **logit + feature distilled AdaPerceiver model**, trained on **ImageNet-12K** using a ViT-H teacher. It exposes both:
+- classification **logits**, and
+- **feature representations**
+making it suitable for **downstream dense prediction tasks** such as semantic segmentation and depth estimation.
+---
+## Training Details
+- **Training Data:** ImageNet-12K
+- **Training Objective:** Logit distillation + feature distillation
+- **Teacher Model:** [ViT-H/14 CLIP model](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k).
+- **Architecture:** Adaptive Perceiver with block-masked attention and Matryoshka FFNs
+- **Adaptivity Axes:** Tokens, Depth, Width
+For full training details, see Appendix D of the paper.
+---
+## How to Use
+This model can be loaded using the AdaPerceiver Hub-compatible class.
+```python
+import torch
+from hub.networks.adaperceiver_distill import DistillAdaPerceiver
+model = DistillAdaPerceiver.from_pretrained("pjajal/adaperceiver-v1")
+out = model(
+    torch.randn(1, 3, 224, 224),
+    num_tokens=256,   # latent token count
+    mat_dim=128,      # embedding width
+    depth=12,         # early-exit depth
+)
+print(out.logits.shape, out.features.shape)
+```
+## Reference
+If you use these models please cite the AdaPerceiver paper:
+```bibtex
+@article{jajal2025adaperceiver,
+  title={AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens},
+  author={Jajal, Purvish and Eliopoulos, Nick John and Chou, Benjamin Shiue-Hal and Thiruvathukal, George K and Lu, Yung-Hsiang and Davis, James C},
+  journal={arXiv preprint arXiv:2511.18105},
+  year={2025}
+}
+```