Instructions to use sds-ai/Yee-R1-mini-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sds-ai/Yee-R1-mini-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sds-ai/Yee-R1-mini-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("sds-ai/Yee-R1-mini-GGUF", dtype="auto") - llama-cpp-python
How to use sds-ai/Yee-R1-mini-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sds-ai/Yee-R1-mini-GGUF", filename="Yee-R1-mini-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use sds-ai/Yee-R1-mini-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Use Docker
docker model run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use sds-ai/Yee-R1-mini-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sds-ai/Yee-R1-mini-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sds-ai/Yee-R1-mini-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M
- SGLang
How to use sds-ai/Yee-R1-mini-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sds-ai/Yee-R1-mini-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sds-ai/Yee-R1-mini-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sds-ai/Yee-R1-mini-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sds-ai/Yee-R1-mini-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use sds-ai/Yee-R1-mini-GGUF with Ollama:
ollama run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M
- Unsloth Studio new
How to use sds-ai/Yee-R1-mini-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sds-ai/Yee-R1-mini-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sds-ai/Yee-R1-mini-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sds-ai/Yee-R1-mini-GGUF to start chatting
- Pi new
How to use sds-ai/Yee-R1-mini-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sds-ai/Yee-R1-mini-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sds-ai/Yee-R1-mini-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use sds-ai/Yee-R1-mini-GGUF with Docker Model Runner:
docker model run hf.co/sds-ai/Yee-R1-mini-GGUF:Q4_K_M
- Lemonade
How to use sds-ai/Yee-R1-mini-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sds-ai/Yee-R1-mini-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Yee-R1-mini-GGUF-Q4_K_M
List all available models
lemonade list
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("sds-ai/Yee-R1-mini-GGUF", dtype="auto")小熠(Yee)AI 数据安全专家
由 广州熠数信息技术有限公司 开发,基于大语言模型技术构建的数据安全智能助手。 该仓库为 Yee-R1-mini 的 GGUF 模型文件
小熠(Yee)是一款专注于 数据安全领域 的 AI 专家系统,依托于先进的 Qwen3-1.7B 大语言模型架构,并融合了数据分类分级、安全审计、防护检测等专业能力。它为工业、政务、运营商等行业提供轻量化、智能化的数据安全解决方案,帮助用户实现“合规、可视、可控、可防”的数据安全目标。
小熠以 AI 数据安全专家大模型 为核心技术基座,构建了全栈式数据安全审计与全链路防泄露体系,在“云”、“管”、“端”三大场景中落地应用,助力企业从容应对数字经济时代的安全挑战。
🔍 核心特点
基于 Qwen3-1.7B 构建
- 使用阿里巴巴通义千问最新一代大语言模型 Qwen3,具备强大的推理、逻辑判断与指令执行能力。
- 支持在 Thinking Mode 和 Non-Thinking Mode 之间灵活切换,适应不同应用场景。
双模推理机制
- 在复杂逻辑任务(如代码分析、数学计算、策略制定)中启用 Thinking Mode。
- 在日常对话、快速响应场景中使用 Non-Thinking Mode,提升效率。
Agent 化能力
- 集成 Qwen-Agent 框架,支持调用外部工具(如数据库接口、日志分析器、API 接口等),实现自动化任务执行。
高兼容性
- 支持主流部署方式:本地运行、Docker 容器、Kubernetes 集群、SaaS API 接口等。
- 兼容 HuggingFace Transformers、vLLM、SGLang、Ollama 等推理框架。
📊 性能测试
以下是小熠在 CS-Eval 中多个安全领域的综合得分测试结果,基于模拟真实业务场景的评估体系生成:
| 综合得分 | 系统安全及软件安全基础 | 访问控制与身份管理 | 加密技术与密钥管理 | 基础设施安全 | AI与网络安全 | 漏洞管理与渗透测试 | 威胁检测与预防 | 数据安全和隐私保护 | 供应链安全 | 安全架构设计 | 业务连续性与应急响应恢复 | 中文任务 | 英文任务 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 77.48 | 78.00 | 79.31 | 71.90 | 78.37 | 84.65 | 75.24 | 78.41 | 73.02 | 86.71 | 80.49 | 71.33 | 77.58 | 76.03 |
📦 快速开始
from transformers import AutoTokenizer, AutoModelForCausalLM
# 加载 tokenizer 和模型
tokenizer = AutoTokenizer.from_pretrained("sds-ai/Yee-R1-mini")
model = AutoModelForCausalLM.from_pretrained(
"sds-ai/Yee-R1-mini",
torch_dtype="auto",
device_map="auto"
)
# 输入提示
prompt = "请帮我检查这份数据是否包含敏感字段?"
# 应用聊天模板并切换模式
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # 切换至思考模式
)
# 编码输入
inputs = tokenizer([text], return_tensors="pt").to(model.device)
# 生成响应
response_ids = model.generate(**inputs, max_new_tokens=32768)
response = tokenizer.decode(response_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print("小熠:\n", response)
🛠️ 部署方式
你可以通过以下任意一种方式部署小熠:
使用 SGLang 启动服务
python -m sglang.launch_server --model-path sds-ai/Yee-R1-mini --reasoning-parser qwen3
使用 vLLM 启动服务
vllm serve sds-ai/Yee-R1-mini --enable-reasoning --reasoning-parser deepseek_r1
使用 Ollama / LMStudio / llama.cpp / KTransformers
Qwen3 已被主流本地化 LLM 工具广泛支持,详情请参考官方文档。
📚 最佳实践建议
为获得最佳性能,请遵循以下推荐设置:
| 场景 | 温度 | TopP | TopK | MinP | Presence Penalty |
|---|---|---|---|---|---|
思考模式 (enable_thinking=True) |
0.6 | 0.95 | 20 | 0 | 1.5 (减少重复输出) |
非思考模式 (enable_thinking=False) |
0.7 | 0.8 | 20 | 0 | 不推荐使用 |
- 输出长度建议设为 32,768 tokens,复杂任务可提升至 38,912 tokens。
- 在多轮对话中,历史记录应仅保留最终输出部分,避免引入思维内容影响上下文理解。
📞 联系我们
了解更多关于小熠的信息,请访问 熠数信息官网
🌟 致谢
感谢阿里通义实验室开源 Qwen3 模型,为小熠提供了坚实的语言理解和生成能力基础。
- Downloads last month
- 97
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for sds-ai/Yee-R1-mini-GGUF
Base model
Qwen/Qwen3-1.7B-Base
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sds-ai/Yee-R1-mini-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)