Sherlock
eyuansu71
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
14 days ago
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
upvoted
a
paper
about 1 month ago
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement
Reading with MeasureBench
upvoted
a
paper
3 months ago
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering
Tasks?