Is the detection and point chat features working?

#2
by vfol - opened

I tried the provided code snippet and got the query and caption prefixes working, but the detect: and point: prefixes seem to be broken. When I try to detect objects, I only see text, as if it’s using the query again

Screenshot from 2025-11-13 23-01-31

image

Oh, yeah, I completely forgot to implement them, though in the quest to get those working, I did end up find some much better ways to architect the modeling code, will get it pushed up in the next couple of days.

Code should now support pointing and bounding box detection. which can be used as such.

outputs = model.generate(
    **inputs,
    use_cache=True,
)
for i, batch in enumerate(outputs):
        print(f"image #{i}")
        if batch.shape[-1] == 4:
            for bbox in batch: # detect
                if torch.all(bbox != 0):
                    print({
                        "min_x": bbox[0].item(),
                        "min_y": bbox[1].item(),
                        "max_x": bbox[2].item(),
                        "max_y": bbox[3].item(),
                    })
        elif batch.shape[-1] == 2:
            for point in batch: # point
                if torch.all(point != 0):
                    print({
                        "x": point[0].item(),
                        "y": point[1].item(),
                    })
NyxKrage changed discussion status to closed

Sign up or log in to comment