arxiv:2607.00666

Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts

Published on Jul 1

· Submitted by

Taewook Kang on Jul 2

Seoul National University

Upvote

Authors:

Abstract

Vision-Language-Action models can be efficiently adapted to new environments using a single demonstration through weight vector arithmetic that isolates domain-specific information via subspace alignment.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Vision-Language-Action (VLA) models often fail to perform the same learned tasks under environmental shifts, such as changes in camera pose and shifts to a different but similar robot (e.g., from Panda to UR5e). Adapting these models to the shifted environment (i.e., target domain) often requires training on multiple demonstrations for each task, which are costly to collect. To reduce the burden of data curation and training, we propose an analogy-based method that adapts VLA models under environmental shifts through weight vector arithmetic with domain-specific information addition, named Domain ARiThmetic (DART). Unlike prior approaches, DART requires collecting only a single demonstration, enabling efficient adaptation. To accurately isolate domain-specific information for addition, DART performs subspace alignment between singular components in weight vectors to filter out noisy components. In both simulated and real-world experiments, DART outperforms existing VLA adaptation methods in one-shot scenarios across diverse visual and embodiment shifts. Code is available at https://github.com/snumprlab/dart.

View arXiv page View PDF Project page GitHub 10 Add to collection

Community

twkang43

Paper submitter about 16 hours ago

•

edited about 16 hours ago

Accepted at ECCV 2026.
Domain Arithmetic (DART) adapts multi-task VLAs to environmental shifts (e.g., camera-pose changes, embodiment changes) using a single demo of a single task through subspace-aligned weight arithmetic.