Instructions to use jdopensource/JoyAI-LLM-Flash-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jdopensource/JoyAI-LLM-Flash-INT8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jdopensource/JoyAI-LLM-Flash-INT8", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("jdopensource/JoyAI-LLM-Flash-INT8", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use jdopensource/JoyAI-LLM-Flash-INT8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jdopensource/JoyAI-LLM-Flash-INT8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-INT8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jdopensource/JoyAI-LLM-Flash-INT8

SGLang

How to use jdopensource/JoyAI-LLM-Flash-INT8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jdopensource/JoyAI-LLM-Flash-INT8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-INT8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jdopensource/JoyAI-LLM-Flash-INT8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-INT8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jdopensource/JoyAI-LLM-Flash-INT8 with Docker Model Runner:
```
docker model run hf.co/jdopensource/JoyAI-LLM-Flash-INT8
```

Mingke977 commited on Apr 6

Commit

c2616ba

verified ·

1 Parent(s): facd22e

Add files using upload-large-folder tool

Browse files

Files changed (1) hide show

README.md +404 -45

README.md CHANGED Viewed

@@ -1,49 +1,408 @@
 ---
-base_model: []
 library_name: transformers
-tags:
-- mergekit
-- merge
 ---
-# c362_step50_ta05
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the [Linear DARE](https://arxiv.org/abs/2311.03099) merge method using /root/myCodeLab/host/downloads/models/40Bra as a base.
-### Models Merged
-The following models were included in the merge:
-* /root/myCodeLab/host/verl/ckpts/40bra_k8s_single_domain/40bra_k8s_16node_sd_c362_20260327_205644_unknown/global_step_50/actor/huggingface
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-base_model: /root/myCodeLab/host/downloads/models/40Bra
-dtype: float32
-merge_method: dare_linear
-modules:
-  default:
-    slices:
-    - sources:
-      - layer_range: [0, 40]
-        model: /root/myCodeLab/host/downloads/models/40Bra
-      - layer_range: [0, 40]
-        model: /root/myCodeLab/host/verl/ckpts/40bra_k8s_single_domain/40bra_k8s_16node_sd_c362_20260327_205644_unknown/global_step_50/actor/huggingface
-        parameters:
-          density: 1.0
-          weight:
-          - filter: .mlp.gate.
-            value: 0.0
-          - value: 0.5
-    - sources:
-      - layer_range: [40, 41]
-        model: /root/myCodeLab/host/downloads/models/40Bra
-out_dtype: bfloat16
 ```

 ---
+language:
+- zh
+- en
+pipeline_tag: text-generation
 library_name: transformers
 ---
+<div align="center">
+  <picture>
+      <img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash">
+  </picture>
+</div>
+<hr>
+<div align="center" style="line-height: 1;">
+  <a href="https://huggingface.co/jdopensource" target="_blank"><img
+alt="Hugging Face"
+src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a>
+  <a
+href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash/blob/main/LICENSE"><img
+alt="License"
+src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a>
+</div>
+## 1. Model Introduction
+JoyAI-LLM-Flash is a state-of-the-art medium-sized instruct language model with
+3 billion activated parameters and 48 billion total parameters. JoyAI-LLM-Flash
+was pretrained on 20 trillion text tokens using Muon optimizer, followed by
+large-scale supervised fine-tuning (SFT), direct preference optimization (DPO),
+and reinforcement learning (RL) across diverse environments. JoyAI-LLM-Flash
+achieves strong performance across frontier knowledge, reasoning, coding tasks
+and agentic capabilities.
+### Key Features
+- Fiber Bundle RL: Introduces fiber bundle theory into reinforcement learning,
+  proposing a novel optimization framework, FiberPO. This method is
+specifically designed to handle the challenges of large-scale and heterogeneous
+agent training, improving stability and robustness under complex data
+distributions.
+- Training-Inference Collaboration: apply Muon optimizer with dense MTP,
+  develop novel optimization techniques to resolve instabilities while scaling
+up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
+- Agentic Intelligence: designed for tool use, reasoning, and autonomous
+  problem-solving.
+## 2. Model Summary
+|                                             |                          |
+| :-----------------------------------------: | :----------------------: |
+|              **Architecture**               | Mixture-of-Experts (MoE) |
+|            **Total Parameters**             |           48B            |
+|          **Activated Parameters**           |            3B            |
+| **Number of Layers** (Dense layer included) |            40            |
+|         **Number of Dense Layers**          |            1             |
+|       **Attention Hidden Dimension**        |           2048           |
+|    **MoE Hidden Dimension** (per Expert)    |           768            |
+|        **Number of Attention Heads**        |            32            |
+|            **Number of Experts**            |           256            |
+|       **Selected Experts per Token**        |            8             |
+|        **Number of Shared Experts**         |            1             |
+|             **Vocabulary Size**             |           129K           |
+|             **Context Length**              |           128K           |
+|           **Attention Mechanism**           |           MLA            |
+|           **Activation Function**           |          SwiGLU          |
+|                   </div>                    |                          |
+## 3. Evaluation Results
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center"><sup>JoyAI-LLM Flash</sup></th>
+<th align="center"><sup>Qwen3-30B-A3B-Instuct-2507</sup></th>
+<th align="center"><sup>GLM-4.7-Flash<br>(Non-thinking)</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan=8><strong>Knowledge &amp; Alignment</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">MMLU</td>
+<td align="center" style="vertical-align: middle"><strong>89.50</strong></td>
+<td align="center" style="vertical-align: middle">86.87</td>
+<td align="center" style="vertical-align: middle">80.53</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">MMLU-Pro</td>
+<td align="center" style="vertical-align: middle"><strong>81.02</strong></td>
+<td align="center" style="vertical-align: middle">73.88</td>
+<td align="center" style="vertical-align: middle">63.62</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">CMMLU</td>
+<td align="center" style="vertical-align: middle"><strong>87.03</strong></td>
+<td align="center" style="vertical-align: middle">85.88</td>
+<td align="center" style="vertical-align: middle">75.85</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">GPQA-Diamond</td>
+<td align="center" style="vertical-align: middle"><strong>74.43</strong></td>
+<td align="center" style="vertical-align: middle">68.69</td>
+<td align="center" style="vertical-align: middle">39.90</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">SuperGPQA</td>
+<td align="center" style="vertical-align: middle"><strong>55.00</strong></td>
+<td align="center" style="vertical-align: middle">52.00</td>
+<td align="center" style="vertical-align: middle">32.00</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">LiveBench</td>
+<td align="center" style="vertical-align: middle"><strong>72.90</strong></td>
+<td align="center" style="vertical-align: middle">59.70</td>
+<td align="center" style="vertical-align: middle">43.10</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">IFEval</td>
+<td align="center" style="vertical-align: middle"><strong>86.69</strong></td>
+<td align="center" style="vertical-align: middle">83.18</td>
+<td align="center" style="vertical-align: middle">82.44</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">AlignBench</td>
+<td align="center" style="vertical-align: middle"><strong>8.24</strong></td>
+<td align="center" style="vertical-align: middle">8.07</td>
+<td align="center" style="vertical-align: middle">6.85</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">HellaSwag</td>
+<td align="center" style="vertical-align: middle"><strong>91.79</strong></td>
+<td align="center" style="vertical-align: middle">89.90</td>
+<td align="center" style="vertical-align: middle">60.84</td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Coding</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">HumanEval</td>
+<td align="center" style="vertical-align: middle"><strong>96.34</strong></td>
+<td align="center" style="vertical-align: middle">95.12</td>
+<td align="center" style="vertical-align: middle">74.39</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">LiveCodeBench</td>
+<td align="center" style="vertical-align: middle"><strong>65.60</strong></td>
+<td align="center" style="vertical-align: middle">39.71</td>
+<td align="center" style="vertical-align: middle">27.43</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">SciCode</td>
+<td align="center" style="vertical-align:
+middle"><strong>3.08/22.92</strong></td>
+<td align="center" style="vertical-align:
+middle"><strong>3.08/22.92</strong></td>
+<td align="center" style="vertical-align: middle">3.08/15.11</td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Mathematics</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">GSM8K</td>
+<td align="center" style="vertical-align: middle"><strong>95.83</strong></td>
+<td align="center" style="vertical-align: middle">79.83</td>
+<td align="center" style="vertical-align: middle">81.88</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">AIME2025</td>
+<td align="center" style="vertical-align: middle"><strong>65.83</strong></td>
+<td align="center" style="vertical-align: middle">62.08</td>
+<td align="center" style="vertical-align: middle">24.17</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">MATH 500</td>
+<td align="center" style="vertical-align: middle"><strong>97.10</strong></td>
+<td align="center" style="vertical-align: middle">89.80</td>
+<td align="center" style="vertical-align: middle">90.90</td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Agentic</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">SWE-bench Verified</td>
+<td align="center" style="vertical-align: middle"><strong>60.60</strong></td>
+<td align="center" style="vertical-align: middle">24.44</td>
+<td align="center" style="vertical-align: middle">51.60</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">Tau2-Retail</td>
+<td align="center" style="vertical-align: middle"><strong>67.55</strong></td>
+<td align="center" style="vertical-align: middle">53.51</td>
+<td align="center" style="vertical-align: middle">62.28</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">Tau2-Airline</td>
+<td align="center" style="vertical-align: middle"><strong>54.00</strong></td>
+<td align="center" style="vertical-align: middle">32.00</td>
+<td align="center" style="vertical-align: middle">52.00</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">Tau2-Telecom</td>
+<td align="center" style="vertical-align: middle">79.83</td>
+<td align="center" style="vertical-align: middle">4.39</td>
+<td align="center" style="vertical-align: middle"><strong>88.60</strong></td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Long Context</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">RULER</td>
+<td align="center" style="vertical-align: middle"><strong>95.60</strong></td>
+<td align="center" style="vertical-align: middle">89.66</td>
+<td align="center" style="vertical-align: middle">56.12</td>
+</tr>
+</tbody>
+</table>
+## 4. Deployment
+> [!Note]
+> You can access JoyAI-LLM Flash API on https://docs.jdcloud.com/cn/jdaip/chat
+> and we provide OpenAI/Anthropic-compatible API for you.
+> Currently, JoyAI-LLM-Flash-Block-INT8 is recommended to run on the following
+> inference engines:
+* SGLang
+Deployment examples can be found in the [Model Deployment
+Guide](docs/deploy_guidance.md).
+## 5. Model Usage
+The usage demos below demonstrate how to call our official API.
+For third-party APIs deployed with vLLM or SGLang, please note that:
+> [!Note] Recommended sampling parameters: `temperature=0.6`, `top_p=1.0`
+### Chat Completion
+This is a simple chat completion script which shows how to call JoyAI-Flash
+API.
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
+def simple_chat(client: OpenAI):
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "which one is bigger, 9.11 or 9.9? think
+carefully.",
+                }
+            ],
+        },
+    ]
+    model_name = client.models.list().data[0].id
+    response = client.chat.completions.create(
+        model=model_name, messages=messages, stream=False, max_tokens=4096
+    )
+    print(f"response: {response.choices[0].message.content}")
+if __name__ == "__main__":
+    simple_chat(client)
 ```
+### Tool call Completion
+This is a simple toll call completion script which shows how to call
+JoyAI-Flash API.
+```python
+import json
+from openai import OpenAI
+client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
+def my_calculator(expression: str) -> str:
+    return str(eval(expression))
+def rewrite(expression: str) -> str:
+    return str(expression)
+def simple_tool_call(client: OpenAI):
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "use my functions to compute the results for the
+equations: 6+1",
+                },
+            ],
+        },
+    ]
+    tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": "my_calculator",
+                "description": "A calculator that can evaluate a mathematical
+equation and compute its results.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "expression": {
+                            "type": "string",
+                            "description": "The mathematical expression to
+evaluate.",
+                        },
+                    },
+                    "required": ["expression"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "rewrite",
+                "description": "Rewrite a given text for improved clarity",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "text": {
+                            "type": "string",
+                            "description": "The input text to rewrite",
+                        }
+                    },
+                },
+            },
+        },
+    ]
+    model_name = client.models.list().data[0].id
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=1.0,
+        max_tokens=1024,
+        tools=tools,
+        tool_choice="auto",
+    )
+    tool_calls = response.choices[0].message.tool_calls
+    results = []
+    for tool_call in tool_calls:
+        function_name = tool_call.function.name
+        function_args = tool_call.function.arguments
+        if function_name == "my_calculator":
+            result = my_calculator(**json.loads(function_args))
+            results.append(result)
+    messages.append({"role": "assistant", "tool_calls": tool_calls})
+    for tool_call, result in zip(tool_calls, results):
+        messages.append(
+            {
+                "role": "tool",
+                "tool_call_id": tool_call.id,
+                "name": tool_call.function.name,
+                "content": result,
+            }
+        )
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=1.0,
+        max_tokens=1024,
+    )
+    print(response.choices[0].message.content)
+if __name__ == "__main__":
+    simple_tool_call(client)
+```
+---
+## 6. License
+Both the code repository and the model weights are released under the [Modified
+MIT License](LICENSE).