To serve the model using vLLM, use the following command:

# from [github repo](https://github.com/MuLabPKU/TransMLA)
import transmla.vllm_registry.deepseek

from vllm import LLM, SamplingParams
llm = LLM(model="fxmeng/TransMLA-llama3-8b-32k", trust_remote_code=True)

Downloads last month: 38

Safetensors

Model size

8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fxmeng/TransMLA-llama3-8b-32k

Base model

meta-llama/Meta-Llama-3-8B

Finetuned

(471)

this model

Collection including fxmeng/TransMLA-llama3-8b-32k

TransMLA-base

Collection

Base Model for TransMLA • 7 items • Updated Nov 7