Ava Lee's picture

8 3

Ava Lee

asheriv91

AI & ML interests

Research on LLM agents and evaluation. Interested in robust deployment.

Recent Activity

liked a model 1 day ago

FlandreS/ppo-Pyramids

upvoted a paper 2 days ago

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

liked a model 6 days ago

tencent/Hy-MT2-1.8B

View all activity

Organizations

None yet

liked a model 1 day ago

FlandreS/ppo-Pyramids

Reinforcement Learning • Updated 1 day ago • 16 • 1

upvoted a paper 2 days ago

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

Paper • 2605.25160 • Published 6 days ago • 3

liked a model 6 days ago

tencent/Hy-MT2-1.8B

Translation • 2B • Updated 3 days ago • 15.8k • • 1.09k

upvoted a paper 7 days ago

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Paper • 2605.22109 • Published 9 days ago • 169

upvoted a paper 10 days ago

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Paper • 2605.12882 • Published 17 days ago • 269

liked a model about 2 months ago

tencent/HY-Embodied-0.5

Image-Text-to-Text • 4B • Updated Apr 14 • 868 • 908

upvoted 3 papers about 2 months ago

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Paper • 2602.12783 • Published Feb 13 • 246

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Paper • 2604.02648 • Published Apr 3 • 47

ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Paper • 2603.24414 • Published Mar 25 • 183

upvoted 2 papers 2 months ago

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published Mar 26 • 155

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Paper • 2603.16859 • Published Mar 17 • 248