arxiv:2410.03198

PersoBench: Benchmarking Personalized Response Generation in Large Language Models

Published on Oct 4, 2024

Authors:

Abstract

LLMs demonstrate strong fluency and diversity in dialogue generation but fall short in producing personalized and coherent responses that align with conversation context and provided personas.

AI-generated summary

While large language models (LLMs) have exhibited impressive conversational capabilities, their proficiency in delivering personalized responses remains unclear. Although recent benchmarks automatically evaluate persona consistency in role-playing contexts using LLM-based judgment, the evaluation of personalization in response generation remains underexplored. To address this gap, we present an automated benchmarking pipeline, PersoBench, to evaluate the personalization ability of LLMs in persona-aware dialogue generation within a zero-shot setting. Our framework employs a structured pipeline comprising speaker-aware annotation, task-specific and context-driven prompt construction, response post-processing, and automated evaluation across multiple dimensions of generation quality. In particular, the pipeline performs text preprocessing and speaker labeling, constructs structured prompts with task instructions and LLM roles, validates response format, and evaluates valid outputs across fluency, personalization, diversity, and coherence. We assess the performance of four open-source and four closed-source LLMs using well-known datasets and a range of explicit metrics. Our findings reveal that while LLMs excel at generating fluent and diverse responses, they are far from satisfactory in delivering personalized and coherent responses, considering both the conversation context and the provided personas.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.03198 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.03198 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.03198 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.