Papers
arxiv:2606.10156

τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Published on Jun 8
· Submitted by
Bharath Sivaram Narasimhan
on Jun 11
Authors:

Abstract

A benchmark for agentic recommender systems is introduced that uses verifiable rewards and controlled dialogue constraints to evaluate conversational agent reliability, revealing significant performance gaps among leading models.

As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current benchmarks often rely on "LLM-as-a-judge" evaluations, which introduce subjectivity, high costs and inconsistency. We present τ-Rec, a benchmark for agentic recommender systems that replaces subjective evaluation with verifiable rewards and a reveal-tagged elicitation (RTE) mechanism that controls how task constraints surface during dialogue. By testing agents against structured catalog predicates and employing a pass^k reliability metric, τ-Rec provides a systematic test for consistent reasoning. Our evaluation of nine configurations across five model families -- GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Flash, DeepSeek V4 Flash, Qwen3-32B and GPT-5 mini -- reveals a steep reliability cliff, where even the best model achieves only ~57% at pass^1 and ~38% at pass^4, highlighting a critical gap in current conversational agent deployment. All code and data are publicly available at https://github.com/nbharaths/tau-rec.

Community

Paper author Paper submitter

τ-Rec: A verifiable benchmark for agentic recommender systems. Dataset: https://huggingface.co/datasets/nbharaths/tau-rec

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.10156
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.10156 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.10156 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.