Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Eureka Lab

non-profit
https://phybench-official.github.io/phybench-demo/
phybench-official
Activity Feed Request to join this org

AI & ML interests

Evaluation Benchmark for Frontier LLMs

SHI QIU's profile picture Yi Hu's profile picture Yunbo Sun's profile picture 郭绍阳's profile picture Vision Tang's profile picture Richard Wei's profile picture Zeyu Cai's profile picture Zhuo-Yang Song's profile picture Yixuan Yin's profile picture Zhang Haoxu's profile picture Tianyu Luo's profile picture Haoling Chang's profile picture Qi Liu's profile picture Chenyang Wang's profile picture

StarThomas1002 
updated a dataset 7 months ago

Eureka-Lab/PHYBench

Viewer • Updated May 16 • 1k • 521 • 60
StarThomas1002 
in Eureka-Lab/PHYBench 7 months ago

New update of PHYBench

#4 opened 7 months ago by
StarThomas1002

Answers for the remaining 400 examples

1
#3 opened 7 months ago by
ssydasheng
StarThomas1002 
authored 2 papers 8 months ago

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Paper • 2502.01719 • Published Feb 3

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 104
CaCtusLuo 
authored a paper 8 months ago

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Paper • 2504.16074 • Published Apr 22 • 36
StarThomas1002 
authored a paper 8 months ago

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Paper • 2504.16074 • Published Apr 22 • 36
StarThomas1002 
published a dataset 8 months ago

Eureka-Lab/PHYBench

Viewer • Updated May 16 • 1k • 521 • 60
StarThomas1002 
authored a paper about 1 year ago

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Paper • 2410.10139 • Published Oct 14, 2024 • 51
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs