Is the detection and point chat features working?

by vfol - opened Nov 13, 2025

vfol

Nov 13, 2025

I tried the provided code snippet and got the query and caption prefixes working, but the detect: and point: prefixes seem to be broken. When I try to detect objects, I only see text, as if it’s using the query again

NyxKrage

Owner Nov 15, 2025

Oh, yeah, I completely forgot to implement them, though in the quest to get those working, I did end up find some much better ways to architect the modeling code, will get it pushed up in the next couple of days.

NyxKrage

Owner Nov 16, 2025

Code should now support pointing and bounding box detection. which can be used as such.

outputs = model.generate(
    **inputs,
    use_cache=True,
)
for i, batch in enumerate(outputs):
        print(f"image #{i}")
        if batch.shape[-1] == 4:
            for bbox in batch: # detect
                if torch.all(bbox != 0):
                    print({
                        "min_x": bbox[0].item(),
                        "min_y": bbox[1].item(),
                        "max_x": bbox[2].item(),
                        "max_y": bbox[3].item(),
                    })
        elif batch.shape[-1] == 2:
            for point in batch: # point
                if torch.all(point != 0):
                    print({
                        "x": point[0].item(),
                        "y": point[1].item(),
                    })

NyxKrage changed discussion status to closed Nov 16, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment