Quentin Tardif's picture

Quentin Tardif

ntnq

·

AI & ML interests

None yet

Recent Activity

upvoted an article 13 days ago

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

upvoted an article 14 days ago

Ulysses Sequence Parallelism: Training with Million-Token Contexts

liked a Space 16 days ago

HuggingFaceFW/finephrase

View all activity

Organizations

upvoted an article 13 days ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

+7

15 days ago

•

74

upvoted an article 14 days ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

16 days ago

•

23

upvoted a paper 20 days ago

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published 24 days ago • 60

upvoted a collection 27 days ago

Qwen3.5

21 items • Updated 15 days ago • 1.28k

upvoted a collection about 1 month ago

Tiny Aya

Bridging Scale and Multilingual Depth • 10 items • Updated Feb 17 • 64

upvoted 2 articles about 1 month ago

Article

Qwen3.5: Nobody Agrees on Attention Anymore

Feb 17

•

15

Article

Compute and Competition in AI: Different FlOPs for Different Folks

Feb 12

•

14

upvoted 2 collections about 2 months ago

Open Coding Agents

13 items • Updated 19 days ago • 51

Qwen3-ASR

4 items • Updated Jan 29 • 55

upvoted a paper about 2 months ago

Ministral 3

Paper • 2601.08584 • Published Jan 13 • 58

upvoted a collection about 2 months ago

Trinity-Large

7 items • Updated about 9 hours ago • 42

upvoted an article about 2 months ago

Article

🪄 Interpreto: A Unified Toolkit for Interpretability of Transformer Models

Jan 20

•

37

upvoted a paper about 2 months ago

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 178

upvoted a paper 2 months ago

Scaling Laws for Code: Every Programming Language Matters

Paper • 2512.13472 • Published Dec 15, 2025 • 15

upvoted 2 articles 3 months ago

Article

Saving Memory Using Padding-Free Transformer Layers during Finetuning

Jun 11, 2024

•

21

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Dec 15, 2025

•

110

upvoted 2 articles 4 months ago

Article

Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand

Dec 4, 2025

•

68

Article

Continuous batching from first principles

+1

Nov 25, 2025

•

350

upvoted a collection 4 months ago

Olmo 3

Artifacts for the Olmo 3 release. • 7 items • Updated 22 days ago • 167

upvoted a paper 4 months ago

Fantastic Pretraining Optimizers and Where to Find Them

Paper • 2509.02046 • Published Sep 2, 2025 • 14