CraftPilot: Building a Multi-Agent Craft Business Assistant with a Single Small Model

Community Article
Published June 9, 2026

How I built a tool for someone I know — a creative introvert who makes beautiful crafts but struggles to sell them.

The Problem

I know someone who crochets, embroiders, paints, and sews the most beautiful things. Friends and family constantly tell her: "You should sell these!" But she never does. Not because the work isn't good enough — it's because the selling part is overwhelming.

Writing product descriptions? Agonizing. Picking a fair price? Impossible. Crafting Instagram captions? Exhausting for an introvert. So the crafts pile up, gifted away or tucked into drawers, while she moves on to the next project.

I built CraftPilot to fix that.

The Solution

CraftPilot is a photo-in, listing-out tool. Upload a photo of your handmade craft, and you get:

  • Catalog metadata — category, materials, colors, complexity, searchable tags
  • Product copy — a title, short description, full description, and 3 Instagram captions
  • Fair pricing — a price range based on material cost, labor time, and market rates
  • Downloadable listing — export everything as an Etsy/Instagram-ready text file
  • Agent traces — full transparency into what each AI agent did

The key constraint: everything runs on a single model (MiniCPM-V 2.6, ~8B parameters) via llama.cpp. No cloud APIs. No subscriptions. No sending your craft photos to OpenAI.

How It Works: The Multi-Agent Pipeline

CraftPilot uses a 4-agent pipeline, all powered by the same model:

Photo -> [Vision Agent] -> [Cataloger Agent] -> [Copywriter Agent] -> [Pricer Agent]
  1. Vision Agent — Takes the photo and produces a detailed text description of the craft item (materials, colors, techniques, style)
  2. Cataloger Agent — Reads the description and outputs structured metadata as JSON (category, materials, tags, complexity)
  3. Copywriter Agent — Takes the catalog data and writes warm, authentic product copy and Instagram captions
  4. Pricer Agent — Considers materials cost, labor hours, complexity, and market rates to suggest a fair price range

Each agent has a specialized system prompt and outputs structured JSON via constrained generation. The pipeline streams results — you see the vision analysis appear first, then catalog data fills in, then copy and pricing.

What I Learned

One Model, Four Agents

MiniCPM-V 2.6 is a multimodal model from OpenBMB that handles both vision and text generation. Running it via llama.cpp means no GPU required, no API costs, and full privacy — your photos never leave your machine.

Small Models Need Focus

A small model is surprisingly capable when you give it:

  • Clear, focused system prompts (one job per agent)
  • Structured output constraints (JSON schema)
  • Pre-computed math (the model can't reliably add, so the pricing template does the arithmetic)

Where it struggles: complex reasoning, nuanced pricing logic, and occasionally inconsistent JSON. Error recovery handles this gracefully — if one agent fails, you still get results from the others.

Streaming Changes Everything

The pipeline streams results as each agent completes. Users see a progress stepper and results filling in piece by piece. This turns a 30-60 second wait into an engaging experience. Without streaming, users assume the app is broken after 10 seconds.

The Stack

  • Model: MiniCPM-V 2.6 (~8B) via llama-cpp-python
  • UI: Gradio 6.x with custom CSS
  • Orchestration: Python pipeline with streaming generators
  • Hosting: Hugging Face Spaces (CPU)
  • Cloud APIs: None

Try It

Try CraftPilot on Hugging Face Spaces

Built for the Build Small Hackathon 2026 — Backyard AI track. Single model, no cloud APIs, full agent transparency.


Built with love for someone who deserves to share her craft with the world.

Community

Sign up or log in to comment