view article Article Demystifying Multimodal Learning: The Hidden Inefficiency in Vision Language Modelling Mar 4 • 4