Instructions to use tashfene/scallopmemory-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use tashfene/scallopmemory-1 with PEFT:
Task type is invalid.
- llama-cpp-python
How to use tashfene/scallopmemory-1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tashfene/scallopmemory-1", filename="scallopmemory-1.q5_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use tashfene/scallopmemory-1 with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf tashfene/scallopmemory-1:Q5_K_M # Run inference directly in the terminal: llama cli -hf tashfene/scallopmemory-1:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf tashfene/scallopmemory-1:Q5_K_M # Run inference directly in the terminal: llama cli -hf tashfene/scallopmemory-1:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tashfene/scallopmemory-1:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf tashfene/scallopmemory-1:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tashfene/scallopmemory-1:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf tashfene/scallopmemory-1:Q5_K_M
Use Docker
docker model run hf.co/tashfene/scallopmemory-1:Q5_K_M
- LM Studio
- Jan
- vLLM
How to use tashfene/scallopmemory-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tashfene/scallopmemory-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tashfene/scallopmemory-1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tashfene/scallopmemory-1:Q5_K_M
- Ollama
How to use tashfene/scallopmemory-1 with Ollama:
ollama run hf.co/tashfene/scallopmemory-1:Q5_K_M
- Unsloth Studio
How to use tashfene/scallopmemory-1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tashfene/scallopmemory-1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tashfene/scallopmemory-1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tashfene/scallopmemory-1 to start chatting
- Pi
How to use tashfene/scallopmemory-1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf tashfene/scallopmemory-1:Q5_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "tashfene/scallopmemory-1:Q5_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use tashfene/scallopmemory-1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf tashfene/scallopmemory-1:Q5_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default tashfene/scallopmemory-1:Q5_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use tashfene/scallopmemory-1 with Docker Model Runner:
docker model run hf.co/tashfene/scallopmemory-1:Q5_K_M
- Lemonade
How to use tashfene/scallopmemory-1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tashfene/scallopmemory-1:Q5_K_M
Run and chat with the model
lemonade run user.scallopmemory-1-Q5_K_M
List all available models
lemonade list
scallopmemory-1
A 4B extraction specialist for local assistants. It reads a conversation and writes down the durable facts worth remembering, or returns nothing when a turn is just chatter.
scallopmemory-1 is a LoRA fine-tune of Qwen3.5-4B, distilled from ScallopBot production traces with a larger model writing the labels. The student never trained on its own generations. The repo ships a q5_k_m GGUF for local serving and the raw adapter for reproduction.
Links: scallopbot.com · GitHub
| Base model | Qwen3.5-4B |
| Adapter | LoRA, rank 32, alpha 64, 2 epochs |
| Quant | q5_k_m GGUF (3.16 GB) |
| Context | inherits Qwen3.5-4B |
| Serving | thinking off (chain-of-thought hurts this task at 4B) |
| Output | structured memory entries (durable facts), or empty |
Files
| File | Format | Size | Notes |
|---|---|---|---|
scallopmemory-1.q5_k_m.gguf |
GGUF Q5_K_M | 3.16 GB | Recommended for llama.cpp / Ollama / LM Studio |
adapter/ |
PEFT LoRA | 170 MB | Apply on top of Qwen/Qwen3.5-4B with transformers + PEFT |
How to run
Serve with thinking disabled. The model is trained and benchmarked in the no-think path.
llama.cpp
llama-server -m scallopmemory-1.q5_k_m.gguf \
--chat-template-kwargs '{"enable_thinking":false}'
Ollama
ollama run hf.co/tashfene/scallopmemory-1:Q5_K_M
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="tashfene/scallopmemory-1",
filename="scallopmemory-1.q5_k_m.gguf",
)
out = llm.create_chat_completion(
messages=[{"role": "user", "content": "<conversation to extract facts from>"}],
)
Adapter on the base model (transformers + PEFT)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B")
model = PeftModel.from_pretrained(base, "tashfene/scallopmemory-1", subfolder="adapter")
tok = AutoTokenizer.from_pretrained("tashfene/scallopmemory-1", subfolder="adapter")
Intended use
An assistant runs this after a conversation to decide what to persist to long-term memory. Two failure modes hurt: writing down noise, and missing a real fact. Most turns produce nothing, so the harder half of the job is staying quiet without going silent on the turns that matter.
Evaluation
33 extraction cases held out from real sessions, none seen in training. Same harness for every model, thinking off.
| Model | Teacher agreement | Parse success | Median latency |
|---|---|---|---|
| Qwen3.6-35B MoE | 0.877 | 57.6% | 41.1s |
| Qwen3.6-Plus (the teacher) | 0.748 | 100% | 31.1s |
| scallopmemory-1 | 0.725 | 100% | 4.2s |
| Qwen3.5-4B (stock) | 0.695 | 100% | 8.8s |
Read this as parity, not a win. The 4B lands close to the hosted model that taught it, at a tenth of the latency and on local hardware. The 35B scores higher on agreement but parses cleanly only 57.6% of the time, so it drops two of every five outputs and cannot sit in a pipeline as is. Among models that return valid structure every time, the 4B edges the teacher and beats the stock base it came from.
Extraction quality is where model capacity shows. An 8B base is the obvious next step to clear the teacher rather than match it.
Training
Traces from one person's assistant, so the distribution is narrow and personal. The same deterministic anonymizer as the tools model swaps real names, emails, phones, handles, and project ids for stable fakes and refuses to write a file if any known real token survives. Anonymized and real-name held-out sets scored within 0.002 of each other.
One detail mattered more than the rest. An early run collapsed because most freshly distilled examples were empty extractions from background chatter, which taught the model to write down nothing. Capping empty examples per session moved agreement from 0.70 to 0.725. If you train your own extractor, watch the share of empty targets.
Limitations and bias
- One user's data, one memory schema. Your facts and format will differ.
- 0.725 agreement means it disagrees with the teacher on roughly a quarter of cases. Check its output before trusting it as ground truth.
- Capacity-bound. A larger base would likely extract better; 4B is the floor for this task, not the ceiling.
- Trained on a single individual's data, so it inherits that person's notion of what counts as memorable.
License
Apache-2.0, inherited from the Qwen3.5-4B base.
- Downloads last month
- 24
5-bit
Model tree for tashfene/scallopmemory-1
Evaluation results
- Teacher agreement on ScallopBot held-out traces (33 cases)self-reported0.725
- Parse success on ScallopBot held-out traces (33 cases)self-reported100.000
- Struct valid on ScallopBot held-out traces (33 cases)self-reported100.000