rskill-omdet-turbo-locator
OpenRAL rSkill β OmDet-Turbo (Swin-tiny) packaged as an Apache-2.0, on-demand open-vocabulary locator (
mode: on_demand, ADR-0051). The reasoner invokes it via the read-onlylocate_in_viewtool β "is object X in view right now?" β when it needs a specific object the continuous detector bank does not cover. A lightweight, real-time, in-process alternative to the 3B NVIDIA LocateAnything VLM for simple "find X" queries. No actuators.
This package wraps hf://omlab/omdet-turbo-swin-tiny-hf with a rskill.yaml
manifest. It does not copy model weights β they are the same Apache-2.0
checkpoint as its continuous sibling omdet-turbo-indoor.
What this skill does
Answers on-demand open-vocabulary localization queries from the reasoner: given
a free-text object (e.g. "the red stapler"), it runs one detection pass on the
current frame and reports whether that object is visible and where. It is not
a continuous background producer β it does not stream into world state every
frame; it responds when prompted (the locate_in_view service / the
detector_query topic). It emits no action chunks and drives no actuators.
| Field | Value |
|---|---|
| Actions | detect |
| Objects | open-vocabulary queried object (any free-text class the reasoner asks for) |
| Scenes | tabletop, kitchen, indoor, household, office |
| Embodiment | embodiment-agnostic (any RGB camera β₯ 640Γ480) |
How it works
OmDet-Turbo is a real-time transformers open-vocabulary detector
(AutoModelForZeroShotObjectDetection), run in-process by the
OmDetTurboDetector
backend (DetectorTier.ZEROSHOT_HF). The same backend serves both detector
modes; this rSkill selects mode: on_demand, so the detector node exposes the
locate_in_view service and the detector_query retarget topic:
detect_with_query(frame, β¦, query)β one-shot detection for a reasoner query without disturbing any persistent vocabulary (thelocate_in_viewpath).set_query(text)β persistently retarget the query (thedetector_querytopic).
The free-text query is parsed into OmDet's multi-label class list by
query_to_classes (comma / </c> separated; a single phrase is one class).
labels in the manifest is only the static default used when no query is
supplied.
Observation β action contract
| Direction | Key | Shape | Notes |
|---|---|---|---|
| in | any RGB camera | (H, W, 3) BGR uint8 |
latest frame cached per camera for the service; min 640Γ480 |
| in | query | text | object/description from the reasoner's locate_in_view call |
| out | ObjectsMetadata |
list of ObjectDetection2D |
(label, confidence, bbox_xyxy); no action chunk |
Upstream model and training
A thin wrapper around the upstream Apache-2.0 OmDet-Turbo checkpoint; weights live upstream and are not copied here.
| Field | Value |
|---|---|
| Source repo | omlab/omdet-turbo-swin-tiny-hf |
| Base model | OmDet-Turbo, Swin-tiny backbone |
| Paper | arxiv:2403.06892 β Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head |
| License | apache-2.0 (commercial use permitted) |
| Parameters | ~115 M |
| Training data | upstream: Objects365 / GoldG and grounding data per the OmDet-Turbo release |
Supported robots
Embodiment-agnostic β the only requirement is an RGB camera stream. All in-tree
embodiment tags are declared in rskill.yaml.
| Robot | Embodiment tag | Status | Notes |
|---|---|---|---|
| any with an RGB camera | franka_panda, so100_follower, aloha, β¦ |
β‘ experimental | camera-only |
Sensors required
Mirrors rskill.yaml::sensors_required.
| Key | Modality | Min resolution | Format |
|---|---|---|---|
| any RGB camera | RGB | 640 Γ 480 | uint8 BGR frame |
Manifest summary
| Field | Value |
|---|---|
name |
OpenRAL/rskill-omdet-turbo-locator |
version |
0.1.0 |
license |
apache-2.0 |
role / kind |
s1 / detector |
runtime / quantization.dtype |
pytorch / fp16 |
detector.engine / detector.mode |
zeroshot_hf / on_demand |
weights_uri |
hf://omlab/omdet-turbo-swin-tiny-hf |
latency_budget.per_chunk_ms |
200 ms |
commercial_use_allowed |
yes (Apache-2.0 weights) |
Full schema: openral_core.schemas.RSkillManifest.
Quick start
uv sync --group omdet # torch + transformers for the in-process backend
from openral_core.schemas import RSkillManifest, DetectorMode
manifest = RSkillManifest.from_yaml("rskills/omdet-turbo-locator/rskill.yaml")
assert manifest.detector.mode is DetectorMode.ON_DEMAND
Reproduction
Packaging-only wrapper β no trained numbers to reproduce. Validate the wiring (manifest + backend query path) without a GPU:
just bootstrap && uv sync --all-packages
uv run pytest tests/unit/test_omdet_turbo_detector.py
Evaluation
No benchmarks shipped β packaging-only wrapper; see CLAUDE.md Β§6.4.
License
This rSkill package (rskill.yaml, README.md) is apache-2.0. The wrapped
weights at hf://omlab/omdet-turbo-swin-tiny-hf are also apache-2.0, so the
locator is fully commercial-safe (CLAUDE.md Β§1.9).
See also
rskills/omdet-turbo-indoor/β the continuous background sibling (same weights,mode: continuous, fixed 266-class vocab).rskills/locateanything-3b-nf4/β higher-quality 3B open-vocab locator (NVIDIA non-commercial;VLM_SIDECARtier).docs/adr/0051-detector-invocation-mode.mdβ continuous vs on-demand detector mode.- CLAUDE.md Β§6.4 β rSkill packaging contract.
Model tree for OpenRAL/rskill-omdet-turbo-locator
Base model
omlab/omdet-turbo-swin-tiny-hf