inclusionAI
/

Ring-mini-linear-2.0

@@ -124,14 +124,14 @@ print("*" * 30)
 #### Environment Preparation
-We will later submit our model to SGLang official release, now we can prepare the environment following steps:
 ```shell
-pip3 install sgl-kernel==0.3.9.post2 vllm==0.10.2
 ```
-Then you should install our sglang whl package:
 ```shell
-pip install https://raw.githubusercontent.com/inclusionAI/Ring-V2/main/hybrid_linear/whls/sglang-0.5.2-py3-none-any.whl --no-deps --force-reinstall
 ```
 #### Run Inference
@@ -153,7 +153,7 @@ python -m sglang.launch_server \
 ```shell
 curl -s http://localhost:${PORT}/v1/chat/completions \
   -H "Content-Type: application/json" \
-  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
 ```
 More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
@@ -169,7 +169,7 @@ pip install torch==2.7.0 torchvision==0.22.0
 Then you should install our vLLM wheel package:
 ```shell
-pip install https://raw.githubusercontent.com/inclusionAI/Ring-V2/main/hybrid_linear/whls/vllm-0.8.5+cuda12_8_gcc10_2_1-cp310-cp310-linux_x86_64.whl --no-deps --force-reinstall
 ```
 #### Offline Inference
@@ -180,12 +180,11 @@ from vllm import LLM, SamplingParams
 tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
-sampling_params = SamplingParams(temperature=0.6, top_p=1.0, max_tokens=16384)
-llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
 prompt = "Give me a short introduction to large language models."
 messages = [
-    {"role": "system", "content": "You are Ling, an assistant created by inclusionAI"},
     {"role": "user", "content": prompt}
 ]
@@ -200,14 +199,8 @@ outputs = llm.generate([text], sampling_params)
 #### Online Inference
 ```shell
 vllm serve inclusionAI/Ring-mini-linear-2.0 \
-              --tensor-parallel-size 2 \
-              --pipeline-parallel-size 1 \
               --gpu-memory-utilization 0.90 \
-              --max-num-seqs 512 \
               --no-enable-prefix-caching
 ```
-For more information, please see our [GitHub](https://github.com/inclusionAI/Ring-V2/blob/main/hybrid_linear/README.md).
-## Citation

 #### Environment Preparation
+We have submitted our [PR](https://github.com/sgl-project/sglang/pull/10917) to SGLang official release and it will be merged later, for now we can prepare the environment following steps, firstly install the community version SGLang and required packages:
 ```shell
+pip install sglang==0.5.2 sgl-kernel==0.3.9.post2 vllm==0.10.2 torch==2.8.0 torchvision==0.23.0 torchao
 ```
+Then you should install our sglang wheel package:
 ```shell
+pip install http://raw.githubusercontent.com/inclusionAI/Ring-V2/blob/main/hybrid_linear/whls/sglang-0.5.2-py3-none-any.whl --no-deps --force-reinstall
 ```
 #### Run Inference
 ```shell
 curl -s http://localhost:${PORT}/v1/chat/completions \
   -H "Content-Type: application/json" \
+  -d '{"model": "auto", "temperature": 0.6, "messages": [{"role": "user", "content": "Give me a short introduction to large language models."}]}'
 ```
 More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
 Then you should install our vLLM wheel package:
 ```shell
+pip install https://media.githubusercontent.com/media/inclusionAI/Ring-V2/refs/heads/main/hybrid_linear/whls/vllm-0.8.5%2Bcuda12_8_gcc10_2_1-cp310-cp310-linux_x86_64.whl --no-deps --force-reinstall
 ```
 #### Offline Inference
 tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
+sampling_params = SamplingParams(temperature=0.6, top_p=1.0, max_tokens=8192)
+llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False)
 prompt = "Give me a short introduction to large language models."
 messages = [
     {"role": "user", "content": prompt}
 ]
 #### Online Inference
 ```shell
 vllm serve inclusionAI/Ring-mini-linear-2.0 \
+              --tensor-parallel-size 1 \
               --gpu-memory-utilization 0.90 \
               --no-enable-prefix-caching
 ```