Quentin Gallouédec's picture

Hiring 💼

Quentin Gallouédec PRO

qgallouedec

·

AI & ML interests

None yet

Recent Activity

liked a model about 6 hours ago

openbmb/MiniCPM4.1-8B

updated a dataset about 13 hours ago

hf-doc-build/doc-build-dev

updated a dataset about 13 hours ago

trl-lib/documentation-images

View all activity

Organizations

upvoted a paper 1 day ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 5 days ago • 75

upvoted a paper 12 days ago

Go-Explore: a New Approach for Hard-Exploration Problems

Paper • 1901.10995 • Published Jan 30, 2019 • 1

upvoted a paper 14 days ago

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2, 2024 • 20

upvoted an article 15 days ago

Article

20x Faster TRL Fine-tuning with RapidFire AI

+1

15 days ago

•

20

upvoted a paper 24 days ago

Knowledge Distillation of Large Language Models

Paper • 2306.08543 • Published Jun 14, 2023 • 21

upvoted a paper 29 days ago

Model Cards for Model Reporting

Paper • 1810.03993 • Published Oct 5, 2018 • 6

upvoted a paper 30 days ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 272

upvoted a collection about 1 month ago

Agent Data Protocol

2 items • Updated Oct 29 • 10

upvoted 3 changelogs about 2 months ago

Changelog

Custom Domains for Spaces

Sep 17

• 82

Changelog

Repositories total file size is now displayed

Sep 18

• 172

Changelog

GGUF Metadata Editor

Oct 7

• 77

upvoted a paper 2 months ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21 • 35

upvoted an article 3 months ago

Article

Parameter-Efficient Fine-Tuning using 🤗 PEFT

Feb 10, 2023

•

109

upvoted 2 papers 3 months ago

Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 193

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Paper • 2305.10425 • Published May 17, 2023 • 6

upvoted 2 papers 4 months ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 49

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180

upvoted a collection 4 months ago

Testing datasets

5 items • Updated Aug 18 • 1

upvoted 2 papers 4 months ago

panda-gym: Open-source goal-conditioned environments for robotic learning

Paper • 2106.13687 • Published Jun 25, 2021 • 3

Cell-Free Latent Go-Explore

Paper • 2208.14928 • Published Aug 31, 2022 • 1