Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published about 1 month ago • 52
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 34
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 72
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Paper • 2412.01268 • Published Dec 2, 2024 • 1
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 435 items • Updated 3 days ago • 65
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 294
Models Used in HackerNoon Publishing System Collection HackerNoon.com’s content management system empowers a small team to manage tens of thousands of writers, advertisers, & millions of readers 🙏 🤖 🙏🤖 • 16 items • Updated Jan 23 • 20
view article Article Train custom AI models with the trainer API and adapt them to 🤗 Jun 29, 2024 • 32
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published May 20, 2024 • 29
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 131
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 129
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5, 2024 • 98
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5, 2024 • 18