rskill-omdet-turbo-locator

OpenRAL rSkill β€” OmDet-Turbo (Swin-tiny) packaged as an Apache-2.0, on-demand open-vocabulary locator (mode: on_demand, ADR-0051). The reasoner invokes it via the read-only locate_in_view tool β€” "is object X in view right now?" β€” when it needs a specific object the continuous detector bank does not cover. A lightweight, real-time, in-process alternative to the 3B NVIDIA LocateAnything VLM for simple "find X" queries. No actuators.

This package wraps hf://omlab/omdet-turbo-swin-tiny-hf with a rskill.yaml manifest. It does not copy model weights β€” they are the same Apache-2.0 checkpoint as its continuous sibling omdet-turbo-indoor.

What this skill does

Answers on-demand open-vocabulary localization queries from the reasoner: given a free-text object (e.g. "the red stapler"), it runs one detection pass on the current frame and reports whether that object is visible and where. It is not a continuous background producer β€” it does not stream into world state every frame; it responds when prompted (the locate_in_view service / the detector_query topic). It emits no action chunks and drives no actuators.

Field Value
Actions detect
Objects open-vocabulary queried object (any free-text class the reasoner asks for)
Scenes tabletop, kitchen, indoor, household, office
Embodiment embodiment-agnostic (any RGB camera β‰₯ 640Γ—480)

How it works

OmDet-Turbo is a real-time transformers open-vocabulary detector (AutoModelForZeroShotObjectDetection), run in-process by the OmDetTurboDetector backend (DetectorTier.ZEROSHOT_HF). The same backend serves both detector modes; this rSkill selects mode: on_demand, so the detector node exposes the locate_in_view service and the detector_query retarget topic:

  • detect_with_query(frame, …, query) β€” one-shot detection for a reasoner query without disturbing any persistent vocabulary (the locate_in_view path).
  • set_query(text) β€” persistently retarget the query (the detector_query topic).

The free-text query is parsed into OmDet's multi-label class list by query_to_classes (comma / </c> separated; a single phrase is one class). labels in the manifest is only the static default used when no query is supplied.

Observation β†’ action contract

Direction Key Shape Notes
in any RGB camera (H, W, 3) BGR uint8 latest frame cached per camera for the service; min 640Γ—480
in query text object/description from the reasoner's locate_in_view call
out ObjectsMetadata list of ObjectDetection2D (label, confidence, bbox_xyxy); no action chunk

Upstream model and training

A thin wrapper around the upstream Apache-2.0 OmDet-Turbo checkpoint; weights live upstream and are not copied here.

Field Value
Source repo omlab/omdet-turbo-swin-tiny-hf
Base model OmDet-Turbo, Swin-tiny backbone
Paper arxiv:2403.06892 β€” Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
License apache-2.0 (commercial use permitted)
Parameters ~115 M
Training data upstream: Objects365 / GoldG and grounding data per the OmDet-Turbo release

Supported robots

Embodiment-agnostic β€” the only requirement is an RGB camera stream. All in-tree embodiment tags are declared in rskill.yaml.

Robot Embodiment tag Status Notes
any with an RGB camera franka_panda, so100_follower, aloha, … ⚑ experimental camera-only

Sensors required

Mirrors rskill.yaml::sensors_required.

Key Modality Min resolution Format
any RGB camera RGB 640 Γ— 480 uint8 BGR frame

Manifest summary

Field Value
name OpenRAL/rskill-omdet-turbo-locator
version 0.1.0
license apache-2.0
role / kind s1 / detector
runtime / quantization.dtype pytorch / fp16
detector.engine / detector.mode zeroshot_hf / on_demand
weights_uri hf://omlab/omdet-turbo-swin-tiny-hf
latency_budget.per_chunk_ms 200 ms
commercial_use_allowed yes (Apache-2.0 weights)

Full schema: openral_core.schemas.RSkillManifest.

Quick start

uv sync --group omdet   # torch + transformers for the in-process backend
from openral_core.schemas import RSkillManifest, DetectorMode

manifest = RSkillManifest.from_yaml("rskills/omdet-turbo-locator/rskill.yaml")
assert manifest.detector.mode is DetectorMode.ON_DEMAND

Reproduction

Packaging-only wrapper β€” no trained numbers to reproduce. Validate the wiring (manifest + backend query path) without a GPU:

just bootstrap && uv sync --all-packages
uv run pytest tests/unit/test_omdet_turbo_detector.py

Evaluation

No benchmarks shipped β€” packaging-only wrapper; see CLAUDE.md Β§6.4.

License

This rSkill package (rskill.yaml, README.md) is apache-2.0. The wrapped weights at hf://omlab/omdet-turbo-swin-tiny-hf are also apache-2.0, so the locator is fully commercial-safe (CLAUDE.md Β§1.9).

See also

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for OpenRAL/rskill-omdet-turbo-locator

Finetuned
(3)
this model

Paper for OpenRAL/rskill-omdet-turbo-locator