DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
Paper • 2605.30350 • Published • 10
How to use jlee-larr/dynaflip-base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("zero-shot-image-classification", model="jlee-larr/dynaflip-base", trust_remote_code=True)
pipe(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png",
candidate_labels=["animals", "humans", "landscape"],
) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("jlee-larr/dynaflip-base", trust_remote_code=True, dtype="auto")This model was proposed in DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.
The model is compatible with Transformers:
from transformers import AutoModel, AutoProcessor
from PIL import Image
import torch
REPO = "jlee-larr/dynaflip-base"
dynaflip = AutoModel.from_pretrained(REPO, trust_remote_code=True).eval()
processor = AutoProcessor.from_pretrained(REPO, trust_remote_code=True)
image = Image.open("example.png").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
v = dynaflip.vision_outputs(inputs["pixel_values"])
# v.last_hidden_state -> (B, num_patches, 768) patch tokens
# v.pooler_output -> (B, 1536) CLS + mean(patches)
@article{lee2026dynaflip,
title = {DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation},
author = {Lee, Jusuk and Lee, Seungjae and Shin, Jonghun and Jung, Hoseong and Kim, Sungha and Cho, Daesol and Kim, H. Jin and Huang, Jia-Bin and Huang, Furong},
journal = {arXiv preprint arXiv:2605.30350},
year = {2026},
}