Eureka Lab

non-profit

https://phybench-official.github.io/phybench-demo/

phybench-official

Activity Feed Request to join this org

AI & ML interests

Evaluation Benchmark for Frontier LLMs

StarThomas1002

updated a dataset 7 months ago

Eureka-Lab/PHYBench

Viewer • Updated May 16 • 1k • 521 • 60

StarThomas1002

in Eureka-Lab/PHYBench 7 months ago

New update of PHYBench

#4 opened 7 months ago by

Answers for the remaining 400 examples

#3 opened 7 months ago by

StarThomas1002

authored 2 papers 8 months ago

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Paper • 2502.01719 • Published Feb 3

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 104

CaCtusLuo

authored a paper 8 months ago

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Paper • 2504.16074 • Published Apr 22 • 36

StarThomas1002

authored a paper 8 months ago

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Paper • 2504.16074 • Published Apr 22 • 36

StarThomas1002

published a dataset 8 months ago

Eureka-Lab/PHYBench

Viewer • Updated May 16 • 1k • 521 • 60

StarThomas1002

authored a paper about 1 year ago

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Paper • 2410.10139 • Published Oct 14, 2024 • 51