Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Jinki Jeong
PRO
Anserwise
1
12
143
Follow
solarbeams's profile picture
ausntmarzi's profile picture
PhysiQuanty's profile picture
19 followers
ยท
32 following
AI & ML interests
None yet
Recent Activity
liked
a dataset
about 8 hours ago
ginigen-ai/Metacognition-Bench
liked
a Space
about 8 hours ago
ginigen-ai/Metacognition-Leaderboard-Space
reacted
to
ginigen-ai
's
post
with ๐ฅ
about 8 hours ago
๐ณ The RoboCasa Kitchen Leaderboard What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) โ and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control. RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks โ picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more โ inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck. The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison. This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables: ๐ Kitchen 24-task (matched) โ head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust. โ Other protocols โ self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate. ๐ค GR1-Tabletop โ a different, humanoid-based variant suite, separated to avoid confusion. Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself. ๐ https://huggingface.co/spaces/ginigen-ai/robocasa-kitchen-leaderboard
View all activity
Organizations
None yet
models
5
Sort:ย Recently updated
Anserwise/AWAXIS-Hybrid-28B
Text Generation
โข
28B
โข
Updated
22 days ago
โข
92
โข
2
Anserwise/AWAXIS-KR-31B
Image-Text-to-Text
โข
52B
โข
Updated
22 days ago
โข
120
โข
4
Anserwise/AWAXIS-Think-31B
Text Generation
โข
31B
โข
Updated
25 days ago
โข
188
โข
2
Anserwise/AWAXIS-Think-28B
Text Generation
โข
28B
โข
Updated
Apr 24
โข
71
โข
17
Anserwise/AWAXIS-Think-27b
Text Generation
โข
27B
โข
Updated
Apr 23
โข
5
โข
1
datasets
0
None public yet