15 18 15

Gabriel Mongaras PRO

gmongaras

https://gmongaras.me/

AI & ML interests

None yet

Recent Activity

liked a Space 8 days ago

microsoft/TRELLIS.2

upvoted a paper 21 days ago

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

upvoted a paper 29 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

View all activity

Organizations

upvoted a paper 21 days ago

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Paper • 2512.08829 • Published 22 days ago • 18

upvoted a paper 29 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 30 days ago • 242

upvoted 2 papers about 2 months ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 201

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Paper • 2510.25976 • Published Oct 29, 2025 • 14

upvoted an article 2 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

Oct 30, 2025

•

upvoted 2 papers 3 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 500

Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30, 2025 • 55

upvoted an article 3 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

upvoted 2 papers 6 months ago

A Systematic Analysis of Hybrid Linear Attention

Paper • 2507.06457 • Published Jul 8, 2025 • 25

Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25

upvoted 2 papers 10 months ago

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 153

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170

upvoted a paper 11 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 166

upvoted a paper 12 months ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300

upvoted 4 papers over 1 year ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 68