Text performance compared to GLM-4.5 Air

by Dampfinchen - opened Aug 11

Aug 11

Hello,

Thank you for releasing this model. How's the performance in text only tasks compared to GLM4.5? Ideally text performance would be the same or better. Open Source badly needs general purpose models that excel in both multimodal and text only tasks at the same time.

CHNtentes

Aug 11

Hello,

Thank you for releasing this model. How's the performance in text only tasks compared to GLM4.5? Ideally text performance would be the same or better. Open Source badly needs general purpose models that excel in both multimodal and text only tasks at the same time.

It's probably worse than 4.5 air. The size is the same, and this model has vision ability, there's got to be some sacrifice in text-only tasks.

erichartford

Aug 25

the size is not the same - 4.5V has extra vision tower.

Strangely, 4.5V has 46 layers in its language model - compared to the 47 layers in 4.5-Air

mratsim

16 days ago

The last layer of GLM Air is the MTP layer (Multi-Token Prediction) / SPeculative decoding layer, it predicts what the model will output to accelerate inference if the prediction is accurate enough.

This was introduced in DeepSeek V3 - https://arxiv.org/pdf/2412.19437v1
and Nvidia covers it quickly in https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/multi_token_prediction.html

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment