Instructions to use sds-ai/Yee-R1-mini-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sds-ai/Yee-R1-mini-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sds-ai/Yee-R1-mini-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("sds-ai/Yee-R1-mini-GGUF", dtype="auto")

llama-cpp-python

How to use sds-ai/Yee-R1-mini-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sds-ai/Yee-R1-mini-GGUF",
	filename="Yee-R1-mini-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use sds-ai/Yee-R1-mini-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Use Docker

docker model run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use sds-ai/Yee-R1-mini-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sds-ai/Yee-R1-mini-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sds-ai/Yee-R1-mini-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M

SGLang

How to use sds-ai/Yee-R1-mini-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sds-ai/Yee-R1-mini-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sds-ai/Yee-R1-mini-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sds-ai/Yee-R1-mini-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sds-ai/Yee-R1-mini-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use sds-ai/Yee-R1-mini-GGUF with Ollama:
```
ollama run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M
```

Unsloth Studio new

How to use sds-ai/Yee-R1-mini-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sds-ai/Yee-R1-mini-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sds-ai/Yee-R1-mini-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sds-ai/Yee-R1-mini-GGUF to start chatting

Pi new

How to use sds-ai/Yee-R1-mini-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sds-ai/Yee-R1-mini-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sds-ai/Yee-R1-mini-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use sds-ai/Yee-R1-mini-GGUF with Docker Model Runner:
```
docker model run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M
```

Lemonade

How to use sds-ai/Yee-R1-mini-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sds-ai/Yee-R1-mini-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Yee-R1-mini-GGUF-Q4_K_M

List all available models

lemonade list

小熠（Yee）AI 数据安全专家

由广州熠数信息技术有限公司开发，基于大语言模型技术构建的数据安全智能助手。该仓库为 Yee-R1-mini 的 GGUF 模型文件

小熠（Yee）是一款专注于 数据安全领域 的 AI 专家系统，依托于先进的 Qwen3-1.7B 大语言模型架构，并融合了数据分类分级、安全审计、防护检测等专业能力。它为工业、政务、运营商等行业提供轻量化、智能化的数据安全解决方案，帮助用户实现“合规、可视、可控、可防”的数据安全目标。

小熠以 AI 数据安全专家大模型 为核心技术基座，构建了全栈式数据安全审计与全链路防泄露体系，在“云”、“管”、“端”三大场景中落地应用，助力企业从容应对数字经济时代的安全挑战。

🔍 核心特点

基于 Qwen3-1.7B 构建
- 使用阿里巴巴通义千问最新一代大语言模型 Qwen3，具备强大的推理、逻辑判断与指令执行能力。
- 支持在 Thinking Mode 和 Non-Thinking Mode 之间灵活切换，适应不同应用场景。
双模推理机制
- 在复杂逻辑任务（如代码分析、数学计算、策略制定）中启用 Thinking Mode。
- 在日常对话、快速响应场景中使用 Non-Thinking Mode，提升效率。
Agent 化能力
- 集成 Qwen-Agent 框架，支持调用外部工具（如数据库接口、日志分析器、API 接口等），实现自动化任务执行。
高兼容性
- 支持主流部署方式：本地运行、Docker 容器、Kubernetes 集群、SaaS API 接口等。
- 兼容 HuggingFace Transformers、vLLM、SGLang、Ollama 等推理框架。

📊 性能测试

以下是小熠在 CS-Eval 中多个安全领域的综合得分测试结果，基于模拟真实业务场景的评估体系生成：

综合得分	系统安全及软件安全基础	访问控制与身份管理	加密技术与密钥管理	基础设施安全	AI与网络安全	漏洞管理与渗透测试	威胁检测与预防	数据安全和隐私保护	供应链安全	安全架构设计	业务连续性与应急响应恢复	中文任务	英文任务
77.48	78.00	79.31	71.90	78.37	84.65	75.24	78.41	73.02	86.71	80.49	71.33	77.58	76.03

📦 快速开始

from transformers import AutoTokenizer, AutoModelForCausalLM

# 加载 tokenizer 和模型
tokenizer = AutoTokenizer.from_pretrained("sds-ai/Yee-R1-mini")
model = AutoModelForCausalLM.from_pretrained(
    "sds-ai/Yee-R1-mini",
    torch_dtype="auto",
    device_map="auto"
)

# 输入提示
prompt = "请帮我检查这份数据是否包含敏感字段？"

# 应用聊天模板并切换模式
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # 切换至思考模式
)

# 编码输入
inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 生成响应
response_ids = model.generate(**inputs, max_new_tokens=32768)
response = tokenizer.decode(response_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True)

print("小熠：\n", response)

🛠️ 部署方式

你可以通过以下任意一种方式部署小熠：

使用 SGLang 启动服务

python -m sglang.launch_server --model-path sds-ai/Yee-R1-mini --reasoning-parser qwen3

使用 vLLM 启动服务

vllm serve sds-ai/Yee-R1-mini --enable-reasoning --reasoning-parser deepseek_r1

使用 Ollama / LMStudio / llama.cpp / KTransformers

Qwen3 已被主流本地化 LLM 工具广泛支持，详情请参考官方文档。

📚 最佳实践建议

为获得最佳性能，请遵循以下推荐设置：

场景	温度	TopP	TopK	MinP	Presence Penalty
思考模式 (`enable_thinking=True`)	0.6	0.95	20	0	1.5 (减少重复输出)
非思考模式 (`enable_thinking=False`)	0.7	0.8	20	0	不推荐使用

输出长度建议设为 32,768 tokens，复杂任务可提升至 38,912 tokens。
在多轮对话中，历史记录应仅保留最终输出部分，避免引入思维内容影响上下文理解。

📞 联系我们

了解更多关于小熠的信息，请访问熠数信息官网

🌟 致谢

感谢阿里通义实验室开源 Qwen3 模型，为小熠提供了坚实的语言理解和生成能力基础。

Downloads last month: 97

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for sds-ai/Yee-R1-mini-GGUF

Base model

Qwen/Qwen3-1.7B-Base

Quantized

(29)

this model

Collection including sds-ai/Yee-R1-mini-GGUF

Yee-R1

Collection

3 items • Updated Aug 29, 2025