Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation
Abstract
V-STAR addresses limitations in generative recommendation by combining value-guided decoding and tree-structured advantage reinforcement to improve exploration and reward signal quality.
Generative recommendation via autoregressive models has unified retrieval and ranking into a single conditional generation framework. However, fine-tuning these models with Reinforcement Learning (RL) often suffers from a fundamental probability-reward mismatch. Conventional likelihood-dominated decoding (e.g., beam search) exhibits a myopic bias toward locally probable prefixes, which causes two critical failures: (1) insufficient exploration, where high-reward items in low-probability branches are prematurely pruned and rarely sampled, and (2) advantage compression, where trajectories sharing high-probability prefixes receive highly correlated rewards with low within-group variance, yielding a weak comparative signal for RL. To address these challenges, we propose V-STAR, a Value-guided Sampling and Tree-structured Advantage Reinforcement framework. V-STAR forms a self-evolving loop via two synergistic components. First, a Value-Guided Efficient Decoding (VED) is developed to identify decisive nodes and selectively deepen high-potential prefixes. This improves exploration efficiency without exhaustive tree search. Second, we propose Sibling-GRPO, which exploits the induced tree topology to compute sibling-relative advantages and concentrates learning signals on decisive branching decisions. Extensive experiments on both offline and online datasets demonstrate that V-STAR outperforms state-of-the-art baselines, delivering superior accuracy and candidate-set diversity under strict latency constraints.
Community
V-STAR introduces value-guided decoding and tree-structured advantage reinforcement learning for generative recommendations, boosting exploration, diversity, and latency-constrained accuracy.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation (2026)
- HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment (2025)
- PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations (2026)
- BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models (2026)
- TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG (2026)
- ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning (2026)
- ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper