Huggingface Projects

company

https://huggingface.co/

huggingface

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

clefourrier authored a paper 1 day ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

AdinaY submitted a paper 1 day ago

Training Language Models via Neural Cellular Automata

sergiopaniego updated a dataset 1 day ago

huggingface-projects/Deep-RL-Course-Certification

View all activity

sergiopaniego

posted an update 1 day ago

Post

ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts

on 4×H100s: 12x longer sequences, 3.7x throughput

learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp

clefourrier

authored a paper 1 day ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 2 days ago • 45

AdinaY

submitted a paper to Daily Papers 1 day ago

Training Language Models via Neural Cellular Automata

Paper • 2603.10055 • Published 5 days ago • 3

sergiopaniego

updated a dataset 1 day ago

huggingface-projects/Deep-RL-Course-Certification

Viewer • Updated 1 day ago • 1.68k • 176 • 18

hysts

updated a Space 1 day ago

Gemma 3n E4B It

⚡

142

Chat with AI using text, images, audio, and video

sergiopaniego

posted an update 2 days ago

Post

116

We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!

We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.

The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.

The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.

We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.

This survey is step one. The async GRPO trainer for TRL is next!

https://huggingface.co/blog/async-rl-training-landscape

sergiopaniego

posted an update 3 days ago

Post

183

Nemotron 3 Super by @nvidia is here! NVIDIA's hybrid Mamba2/Transformer models are now natively supported in transformers (no trust_remote_code needed)

Fine-tune them with TRL in just a few lines of code. Notebook + script included to get started right away. goooo!

- Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb
- Script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py
- Collection with all the models: https://huggingface.co/collections/nvidia/nvidia-nemotron-v3

pcuenq

updated a dataset 3 days ago

huggingface-projects/drlc-leaderboard-data

Viewer • Updated 3 days ago • 49.1k • 820 • 2

AdinaY

submitted a paper to Daily Papers 5 days ago

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Paper • 2603.05438 • Published 9 days ago • 35

hysts

updated 7 Spaces 7 days ago

Gemma 3 12b It

🔥

163

Chat with a multimodal AI using images, video, and text

Gemma 2 2B JPN IT

😻

Chatbot

Gemma 2 9B IT

😻

101

Chatbot

Gemma 2 2B IT

😻

Chatbot

Llama 3.2 3B Instruct

😻

123

Chatbot

Llama 2 13b Chat

🦙

490

Chat with the Llama‑2 13B language model

Llama 2 7B Chat

🏆

482

Chat with an AI using Llama‑2 7B model

sergiopaniego

posted an update 11 days ago

Post

470

did you know you can train agentic models with RL deploying the environments on HF Spaces? 🤗

with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces

want to train faster? → just add more Spaces (TRL handles the parallelization natively)

we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU

full write-up with code and results → https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl

sergiopaniego

posted an update 12 days ago

Post

356

Qwen3.5 dense (smol 🤏) models just dropped

- natively multimodal
- 0.8B · 2B · 4B · 9B (+ base variants)
- 262K context extensible to 1M
- built-in thinking

fine-tune them with TRL out of the box → SFT, GRPO, DPO and more!

examples: https://huggingface.co/docs/trl/example_overview
collection: https://huggingface.co/collections/Qwen/qwen35