Abstract
WIZARD is a weight-space meta-learning framework that generates task-specific LoRA parameters for frozen VLA policies using language instructions and demonstration videos, enabling efficient task adaptation without fine-tuning.
Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.
Community
Paper Title
Robotic Policy Adaptation via Weight-Space Meta-Learning
Short Summary (TL;DR)
The authors present WIZARD, a framework that enables zero-shot robotic policy adaptation for large Vision-Language-Action (VLA) models without any test-time fine-tuning, online optimization, or action labels. Instead of tuning via gradients at deployment, a meta-network predicts task-specific LoRA parameters in a single forward pass from a language prompt and a short demonstration video. On the LIBERO benchmark, WIZARD improves success rates by up to 2x on unseen datasets and up to 14x on unseen tasks.
Suggested Community Comment
Title: ๐ Zero-Shot LoRA Parameter Generation for Large VLAs
This paper introduces a clever workaround to the expensive, action-labeled fine-tuning usually required to adapt Vision-Language-Action (VLA) models to new tasks.
Why it's interesting:
- No Test-Time Gradients: WIZARD bypasses deployment fine-tuning entirely. It maps multimodal task embeddings directly to specialized LoRA parameters in a single forward pass.
- Scale-Aware Architecture: To stabilize weight generation across heterogeneous VLA modules, it introduces instance-wise token normalization and explicitly predicts layer-wise statistics.
- Strong Zero-Shot Baselines: It hits an average success rate of 40% on LIBERO-Spatial (vs. 19% for standard multi-task VLAs) and successfully transfers to a physical 7-DoF Franka arm, nearly doubling real-world success rates from 0.22 to 0.41.
It's a highly scalable approach to parameter generation that avoids full-policy weight synthesis while delivering serious data efficiency.
Definitely worth a read!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models (2026)
- Light-WAM: Efficient World Action Models with State-Fusion Action Decoding (2026)
- PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models (2026)
- PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction (2026)
- SynthICL: Scalable In-context Imitation Learning with Synthetic Data (2026)
- ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation (2026)
- Being-H0.7: A Latent World-Action Model from Egocentric Videos (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.07217 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper