Model Summary

NLLB-CLIP is a model that combines a text encoder from the NLLB model and an image encoder from the LAION CLIP. This allows us to extend the model capabilities to 201 languages of the Flores-200. NLLB-CLIP sets state-of-the-art on the Crossmodal-3600 dataset by performing very well on low-resource languages. You can find more details about the model in the paper.

Acknowledgements

I thank ML Collective for providing Google Cloud compute resources to train the OpenCLIP-compatible version of NLLB-CLIP.

Downloads last month: 49

Dataset used to train visheratin/nllb-clip-large-oc

Spaces using visheratin/nllb-clip-large-oc 4

Paper for visheratin/nllb-clip-large-oc

NLLB-CLIP -- train performant multilingual image retrieval model on a budget

Paper • 2309.01859 • Published Sep 4, 2023 • 3