A working example of how to use this model with vLLM

#13
by shigureui - opened

https://github.com/WuNein/LocateAnything-vLLM/

Instead of dealing with complex manual KV-cache management and slow Hugging Face native generate() loops, this pipeline computes mixed multimodal embeddings (Text + Image) locally, and sends them via Base64 serialization to a high-throughput vLLM server using the OpenAI-compatible Completions API.

Welcome to star for my project. This project mainly written by hand and QA-style prompting.

Sign up or log in to comment