AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning
Synthetic Sandbox for Training Machine Learning Engineering Agents
-
facebook/metaclip-2-worldwide-huge-quickgelu
Zero-Shot Image Classification • 2B • Updated • 18k • 18 -
facebook/metaclip-2-worldwide-huge-378
Zero-Shot Image Classification • 2B • Updated • 304 • 7 -
facebook/metaclip-2-worldwide-giant
Zero-Shot Image Classification • 4B • Updated • 1.26k • 8 -
facebook/metaclip-2-worldwide-giant-378
Zero-Shot Image Classification • 4B • Updated • 312 • 13
MobileLLM-R1, a series of sub-billion parameter reasoning models
DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104
-
facebook/dinov3-vit7b16-pretrain-lvd1689m
Image Feature Extraction • 7B • Updated • 9.29k • 225 -
facebook/dinov3-vits16-pretrain-lvd1689m
Image Feature Extraction • 21.6M • Updated • 355k • 85 -
facebook/dinov3-convnext-small-pretrain-lvd1689m
Image Feature Extraction • 49.5M • Updated • 15.9k • 26 -
facebook/dinov3-vitb16-pretrain-lvd1689m
Image Feature Extraction • 85.7M • Updated • 1.48M • 116
Scaling CLIP data with transparent training distribution from an end-to-end pipeline.
-
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.8k • 49 -
facebook/metaclip-l14-fullcc2.5b
Zero-Shot Image Classification • Updated • 393 • 7 -
facebook/metaclip-b16-fullcc2.5b
Zero-Shot Image Classification • Updated • 1.3k • 11 -
facebook/metaclip-b32-fullcc2.5b
Zero-Shot Image Classification • Updated • 208 • 9
-
facebook/webssl-dino300m-full2b-224
Image Feature Extraction • 0.3B • Updated • 1.25k • 12 -
facebook/webssl-dino1b-full2b-224
Image Feature Extraction • 1B • Updated • 103 • 3 -
facebook/webssl-dino2b-full2b-224
Image Feature Extraction • 2B • Updated • 95 -
facebook/webssl-dino3b-full2b-224
Image Feature Extraction • 3B • Updated • 90
A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing
MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Masked Audio Generation using a Single Non-Autoregressive Transformer
SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly.
First release checkpoints for XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.
A collection of open-source artefacts (datasets + checkpoints) from the first VoxPopuli release.
-
facebook/voxpopuli
Viewer • Updated • 1.26M • 23.6k • 148 -
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-100k-voxpopuli
Automatic Speech Recognition • Updated • 39 • 4 -
facebook/wav2vec2-base-10k-voxpopuli-ft-cs
Automatic Speech Recognition • Updated • 22
A collection of checkpoints from the HuBERT release, a speech encoder that learns powerful representations from unlabelled audio data.
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Paper • 2106.07447 • Published • 4 -
facebook/hubert-base-ls960
Feature Extraction • Updated • 323k • 74 -
facebook/hubert-large-ll60k
Feature Extraction • Updated • 42.8k • • 35 -
facebook/hubert-large-ls960-ft
Automatic Speech Recognition • Updated • 148k • 76
DINOv2: foundation models producing robust visual features suitable for image-level and pixel-level visual tasks - https://arxiv.org/abs/2304.07193
-
facebook/dinov2-small
Image Feature Extraction • 22.1M • Updated • 2.53M • 63 -
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.73M • 176 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 1.43M • 105 -
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 256k • 60
Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning.
Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens
Latent Geometry for Fully Neural Real-time Novel View Synthesis
Collection for Code World Model, an agentic coding model from FAIR.
A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann
-
facebook/vjepa2-vitl-fpc64-256
Video Classification • 0.3B • Updated • 100k • 192 -
facebook/vjepa2-vith-fpc64-256
Video Classification • 0.7B • Updated • 1.56k • 15 -
facebook/vjepa2-vitg-fpc64-256
Video Classification • 1B • Updated • 580k • 41 -
facebook/vjepa2-vitg-fpc64-384
Video Classification • 1B • Updated • 14.9k • 40
A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages.
Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
facebook/MobileLLM-125M
Text Generation • Updated • 1.58k • 131 -
facebook/MobileLLM-350M
Text Generation • Updated • 312 • 36 -
facebook/MobileLLM-600M
Text Generation • Updated • 472 • 29
Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710
-
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 80 -
facebook/layerskip-llama2-7B
Text Generation • 7B • Updated • 102 • 16 -
facebook/layerskip-llama2-13B
Text Generation • 13B • Updated • 16 • 5 -
facebook/layerskip-llama2-70B
Text Generation • 69B • Updated • 97 • 5
A significant step towards removing language barriers through expressive, fast and high-quality AI translation.
-
Seamless: Multilingual Expressive and Streaming Speech Translation
Paper • 2312.05187 • Published • 14 -
facebook/seamless-m4t-v2-large
Automatic Speech Recognition • 2B • Updated • 71.2k • 972 -
Seamless M4T v2
📞516Translate speech and text between languages
-
facebook/seamless-expressive
Text-to-Speech • Updated • 188
A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data.
-
facebook/wav2vec2-large-960h-lv60-self
Automatic Speech Recognition • Updated • 70.3k • 161 -
facebook/wav2vec2-large-960h
Automatic Speech Recognition • Updated • 39k • 35 -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.21M • 396 -
facebook/wav2vec2-base-100h
Automatic Speech Recognition • Updated • 12.2k • 7
A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition.
-
facebook/wav2vec2-large-xlsr-53
Updated • 296k • 157 -
facebook/wav2vec2-xlsr-53-espeak-cv-ft
Automatic Speech Recognition • Updated • 315k • 49 -
facebook/wav2vec2-large-xlsr-53-dutch
Automatic Speech Recognition • Updated • 389 • 3 -
facebook/wav2vec2-large-xlsr-53-french
Automatic Speech Recognition • Updated • 665 • 13
A collection of "robust" Wav2Vec 2.0 checkpoints pre-trained on datasets from multiple domains.
-
facebook/wav2vec2-large-robust
Updated • 4.27k • 39 -
facebook/wav2vec2-large-robust-ft-libri-960h
Automatic Speech Recognition • 0.3B • Updated • 77.7k • 16 -
facebook/wav2vec2-large-robust-ft-swbd-300h
Automatic Speech Recognition • Updated • 4.79k • 20 -
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Paper • 2104.01027 • Published • 2
A collection of checkpoints from the second VoxPopuli release.
-
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-bg-voxpopuli-v2
Automatic Speech Recognition • Updated • 23 • 2 -
facebook/wav2vec2-base-cs-voxpopuli-v2
Automatic Speech Recognition • Updated • 14 • 1 -
facebook/wav2vec2-base-da-voxpopuli-v2
Automatic Speech Recognition • Updated • 9
Text-to-speech models from fairseq s^2
A collection of stereo music generation models as part of the v2 MusicGen release.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3.
Latent Geometry for Fully Neural Real-time Novel View Synthesis
-
facebook/metaclip-2-worldwide-huge-quickgelu
Zero-Shot Image Classification • 2B • Updated • 18k • 18 -
facebook/metaclip-2-worldwide-huge-378
Zero-Shot Image Classification • 2B • Updated • 304 • 7 -
facebook/metaclip-2-worldwide-giant
Zero-Shot Image Classification • 4B • Updated • 1.26k • 8 -
facebook/metaclip-2-worldwide-giant-378
Zero-Shot Image Classification • 4B • Updated • 312 • 13
MobileLLM-R1, a series of sub-billion parameter reasoning models
Collection for Code World Model, an agentic coding model from FAIR.
DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104
-
facebook/dinov3-vit7b16-pretrain-lvd1689m
Image Feature Extraction • 7B • Updated • 9.29k • 225 -
facebook/dinov3-vits16-pretrain-lvd1689m
Image Feature Extraction • 21.6M • Updated • 355k • 85 -
facebook/dinov3-convnext-small-pretrain-lvd1689m
Image Feature Extraction • 49.5M • Updated • 15.9k • 26 -
facebook/dinov3-vitb16-pretrain-lvd1689m
Image Feature Extraction • 85.7M • Updated • 1.48M • 116
Scaling CLIP data with transparent training distribution from an end-to-end pipeline.
-
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.8k • 49 -
facebook/metaclip-l14-fullcc2.5b
Zero-Shot Image Classification • Updated • 393 • 7 -
facebook/metaclip-b16-fullcc2.5b
Zero-Shot Image Classification • Updated • 1.3k • 11 -
facebook/metaclip-b32-fullcc2.5b
Zero-Shot Image Classification • Updated • 208 • 9
A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann
-
facebook/vjepa2-vitl-fpc64-256
Video Classification • 0.3B • Updated • 100k • 192 -
facebook/vjepa2-vith-fpc64-256
Video Classification • 0.7B • Updated • 1.56k • 15 -
facebook/vjepa2-vitg-fpc64-256
Video Classification • 1B • Updated • 580k • 41 -
facebook/vjepa2-vitg-fpc64-384
Video Classification • 1B • Updated • 14.9k • 40
-
facebook/webssl-dino300m-full2b-224
Image Feature Extraction • 0.3B • Updated • 1.25k • 12 -
facebook/webssl-dino1b-full2b-224
Image Feature Extraction • 1B • Updated • 103 • 3 -
facebook/webssl-dino2b-full2b-224
Image Feature Extraction • 2B • Updated • 95 -
facebook/webssl-dino3b-full2b-224
Image Feature Extraction • 3B • Updated • 90
A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages.
A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
facebook/MobileLLM-125M
Text Generation • Updated • 1.58k • 131 -
facebook/MobileLLM-350M
Text Generation • Updated • 312 • 36 -
facebook/MobileLLM-600M
Text Generation • Updated • 472 • 29
Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing
Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710
-
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 80 -
facebook/layerskip-llama2-7B
Text Generation • 7B • Updated • 102 • 16 -
facebook/layerskip-llama2-13B
Text Generation • 13B • Updated • 16 • 5 -
facebook/layerskip-llama2-70B
Text Generation • 69B • Updated • 97 • 5
MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
A significant step towards removing language barriers through expressive, fast and high-quality AI translation.
-
Seamless: Multilingual Expressive and Streaming Speech Translation
Paper • 2312.05187 • Published • 14 -
facebook/seamless-m4t-v2-large
Automatic Speech Recognition • 2B • Updated • 71.2k • 972 -
Seamless M4T v2
📞516Translate speech and text between languages
-
facebook/seamless-expressive
Text-to-Speech • Updated • 188
Masked Audio Generation using a Single Non-Autoregressive Transformer
A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data.
-
facebook/wav2vec2-large-960h-lv60-self
Automatic Speech Recognition • Updated • 70.3k • 161 -
facebook/wav2vec2-large-960h
Automatic Speech Recognition • Updated • 39k • 35 -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.21M • 396 -
facebook/wav2vec2-base-100h
Automatic Speech Recognition • Updated • 12.2k • 7
SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly.
A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition.
-
facebook/wav2vec2-large-xlsr-53
Updated • 296k • 157 -
facebook/wav2vec2-xlsr-53-espeak-cv-ft
Automatic Speech Recognition • Updated • 315k • 49 -
facebook/wav2vec2-large-xlsr-53-dutch
Automatic Speech Recognition • Updated • 389 • 3 -
facebook/wav2vec2-large-xlsr-53-french
Automatic Speech Recognition • Updated • 665 • 13
First release checkpoints for XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.
A collection of "robust" Wav2Vec 2.0 checkpoints pre-trained on datasets from multiple domains.
-
facebook/wav2vec2-large-robust
Updated • 4.27k • 39 -
facebook/wav2vec2-large-robust-ft-libri-960h
Automatic Speech Recognition • 0.3B • Updated • 77.7k • 16 -
facebook/wav2vec2-large-robust-ft-swbd-300h
Automatic Speech Recognition • Updated • 4.79k • 20 -
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Paper • 2104.01027 • Published • 2
A collection of open-source artefacts (datasets + checkpoints) from the first VoxPopuli release.
-
facebook/voxpopuli
Viewer • Updated • 1.26M • 23.6k • 148 -
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-100k-voxpopuli
Automatic Speech Recognition • Updated • 39 • 4 -
facebook/wav2vec2-base-10k-voxpopuli-ft-cs
Automatic Speech Recognition • Updated • 22
A collection of checkpoints from the second VoxPopuli release.
-
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-bg-voxpopuli-v2
Automatic Speech Recognition • Updated • 23 • 2 -
facebook/wav2vec2-base-cs-voxpopuli-v2
Automatic Speech Recognition • Updated • 14 • 1 -
facebook/wav2vec2-base-da-voxpopuli-v2
Automatic Speech Recognition • Updated • 9
A collection of checkpoints from the HuBERT release, a speech encoder that learns powerful representations from unlabelled audio data.
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Paper • 2106.07447 • Published • 4 -
facebook/hubert-base-ls960
Feature Extraction • Updated • 323k • 74 -
facebook/hubert-large-ll60k
Feature Extraction • Updated • 42.8k • • 35 -
facebook/hubert-large-ls960-ft
Automatic Speech Recognition • Updated • 148k • 76
Text-to-speech models from fairseq s^2
DINOv2: foundation models producing robust visual features suitable for image-level and pixel-level visual tasks - https://arxiv.org/abs/2304.07193
-
facebook/dinov2-small
Image Feature Extraction • 22.1M • Updated • 2.53M • 63 -
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.73M • 176 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 1.43M • 105 -
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 256k • 60
A collection of stereo music generation models as part of the v2 MusicGen release.
Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens
OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3.