Adjust context size

#1
by retowyss - opened

Reduce max_position_embeddings from 5M to 128k as advertised on model card.
Discovered this when starting vllm with --max-model-len autoand it went OOM on 2x Pro 6k.

Cohere Labs org

@retowyss thank you for flagging this. we updated the export to correctly reflect the model's context length

walterbm-cohere changed pull request status to closed

Sign up or log in to comment