Papers
arxiv:2602.01756

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation

Published on Feb 2
ยท Submitted by
Zilong Huang
on Feb 3
Authors:
,
,
,
,
,
,
,

Abstract

Mind-Brush presents a unified agentic framework for text-to-image generation that dynamically retrieves multimodal evidence and employs reasoning tools to improve understanding of implicit user intentions and complex knowledge reasoning.

AI-generated summary

While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although emerging unified understanding-generation models have improved intent comprehension, they still struggle to accomplish tasks involving complex knowledge reasoning within a single model. Moreover, constrained by static internal priors, these models remain unable to adapt to the evolving dynamics of the real world. To bridge these gaps, we introduce Mind-Brush, a unified agentic framework that transforms generation into a dynamic, knowledge-driven workflow. Simulating a human-like 'think-research-create' paradigm, Mind-Brush actively retrieves multimodal evidence to ground out-of-distribution concepts and employs reasoning tools to resolve implicit visual constraints. To rigorously evaluate these capabilities, we propose Mind-Bench, a comprehensive benchmark comprising 500 distinct samples spanning real-time news, emerging concepts, and domains such as mathematical and Geo-Reasoning. Extensive experiments demonstrate that Mind-Brush significantly enhances the capabilities of unified models, realizing a zero-to-one capability leap for the Qwen-Image baseline on Mind-Bench, while achieving superior results on established benchmarks like WISE and RISE.

Community

Paper author Paper submitter
  • ๐Ÿง  Mind-Brush Framework: A novel agentic paradigm that unifies Intent Analysis, Multi-modal Search, and Knowledge Reasoning into a seamless "Think-Research-Create" workflow for image generation.
  • ๐Ÿ“Š Mind-Bench: A specialized benchmark designed to evaluate generative models on dynamic external knowledge and complex logical deduction, exposing the reasoning gaps in current SOTA multimodal models.
  • ๐Ÿ† Superior Performance:
    • Elevates Qwen-Image baseline accuracy from 0.02 to 0.31 on Mind-Bench.
    • Outperforms existing baselines on WISE (+25.8% WiScore) and RISEBench (+27.3% Accuracy).
      Idea2Image Flag

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.01756 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.01756 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.01756 in a Space README.md to link it from this page.

Collections including this paper 2