Text performance compared to GLM-4.5 Air
Hello,
Thank you for releasing this model. How's the performance in text only tasks compared to GLM4.5? Ideally text performance would be the same or better. Open Source badly needs general purpose models that excel in both multimodal and text only tasks at the same time.
Hello,
Thank you for releasing this model. How's the performance in text only tasks compared to GLM4.5? Ideally text performance would be the same or better. Open Source badly needs general purpose models that excel in both multimodal and text only tasks at the same time.
It's probably worse than 4.5 air. The size is the same, and this model has vision ability, there's got to be some sacrifice in text-only tasks.
The last layer of GLM Air is the MTP layer (Multi-Token Prediction) / SPeculative decoding layer, it predicts what the model will output to accelerate inference if the prediction is accurate enough.
This was introduced in DeepSeek V3 - https://arxiv.org/pdf/2412.19437v1
and Nvidia covers it quickly in https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/multi_token_prediction.html

