Papers
arxiv:2603.23483

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Published on Mar 24
· Submitted by
Huang
on Mar 25
#3 Paper of the day
Authors:
,
,
,

Abstract

SpecEyes accelerates agentic multimodal large language models by using a lightweight speculative planner with cognitive gating and heterogeneous parallel processing to reduce latency and improve throughput.

AI-generated summary

Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascaded perception, reasoning, and tool-calling loops introduce significant sequential overhead. This overhead, termed agentic depth, incurs prohibitive latency and seriously limits system-level concurrency. To this end, we propose SpecEyes, an agentic-level speculative acceleration framework that breaks this sequential bottleneck. Our key insight is that a lightweight, tool-free MLLM can serve as a speculative planner to predict the execution trajectory, enabling early termination of expensive tool chains without sacrificing accuracy. To regulate this speculative planning, we introduce a cognitive gating mechanism based on answer separability, which quantifies the model's confidence for self-verification without requiring oracle labels. Furthermore, we design a heterogeneous parallel funnel that exploits the stateless concurrency of the small model to mask the stateful serial execution of the large model, maximizing system throughput. Extensive experiments on V* Bench, HR-Bench, and POPE demonstrate that SpecEyes achieves 1.1-3.35x speedup over the agentic baseline while preserving or even improving accuracy (up to +6.7%), thereby boosting serving throughput under concurrent workloads.

Community

Paper author Paper submitter

SpecEyes is an agentic-level speculative acceleration framework that bypasses redundant tool-use loops in multimodal LLMs using a lightweight model and a cognitive gating mechanism, significantly improving speed and throughput without sacrificing accuracy.

Code: https://github.com/MAC-AutoML/SpecEyes

Interesting breakdown of this paper on arXivLens: https://arxivlens.com/PaperView/Details/speceyes-accelerating-agentic-multimodal-llms-via-speculative-perception-and-planning-8571-ef6fcfed
Covers the executive summary, detailed methodology, and practical applications.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.23483
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.23483 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.23483 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.23483 in a Space README.md to link it from this page.

Collections including this paper 5