RefineBench: Evaluating Refinement Capability of Language Models via Checklists
Paper
•
2511.22173
•
Published
•
12
None defined yet.
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language