To serve the model using vLLM, use the following command:

# from [github repo](https://github.com/MuLabPKU/TransMLA)
import transmla.vllm_registry.deepseek

from vllm import LLM, SamplingParams
llm = LLM(model="fxmeng/TransMLA-llama3-8b-32k", trust_remote_code=True)
Downloads last month
38
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fxmeng/TransMLA-llama3-8b-32k

Finetuned
(471)
this model

Collection including fxmeng/TransMLA-llama3-8b-32k