Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".
Xuan Yang
TorresYang
·
AI & ML interests
LLM reasoning, agent
Recent Activity
authored a paper about 22 hours ago
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal
Foundation Models authored a paper about 22 hours ago
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions updated a collection 8 days ago
RUT-Bench