Field Notes: Putting a Chinese Trading Theory on a Fine-Tuned 1.7B
gradio.Server), a local llama.cpp sub-agent pool, a model I fine-tuned myself, and zero cloud APIs.
The problem (a real one)
My family trades US stocks using 缠论 (Chan theory) — a Chinese technical-analysis framework built on fractals, strokes, segments and "pivots" (中枢), famous for its rigor and infamous for its complexity. We had a battle-tested Python engine that decomposes monthly → weekly → daily → 60m → 30m → 15m → 5m charts and emits the classic three buy / three sell points. Two problems: the engine's reasoning logs are dense Chinese jargon, and someone had to run it by hand every evening.
Chan Compass fixes both: a Gradio Space that auto-updates after the US close (18:10 ET), answers exactly one question per ticker — do I buy or sell tomorrow, at what buy point, in which zone, and where am I wrong? — and uses a pool of small local models as the translators and analysts the rule engine can't be. The results are persisted to disk, so when the family opens the app the next morning, the overnight run is already there waiting — no spinner, no recompute.
Why small models are the honest fit
The LLMs are not doing the math. Fractal merging, divergence grading and nested-interval confirmation are deterministic code — a 32B (or 320B) model would add nothing but hallucination risk and latency. What the rule engine can't do is language: turn a Chinese decision chain into a plain-English summary, brief today's headlines on my holdings, narrate sector rotation, write a structured research note. That's bounded, evidence-grounded text work — exactly what 1.7B–4B models handle on 8 CPU cores. The constraint drew the architecture: rules compute, small models speak.
A pool of sub-agents, not one model
Early versions shared a single model behind one lock, so the moment two features ran at once you'd get "model busy." The fix was a sub-agent pool — four small models, each with its own lock, running through llama.cpp:
- Interpreter (Signals → AI Interpret) — my fine-tuned Chan-Tuned Qwen3-1.7B
- Narrator (sector rotation) — Qwen3-1.7B
- Reporter (watchlist news + research support) — Qwen3-1.7B
- Analyst (Auto Research) — Qwen3-4B
They genuinely run in parallel: you can stream a research report while the Signals summary writes, with no collision.
The fine-tune (the part I'm proudest of)
The Interpreter sub-agent runs a model I trained myself. The app captures every
(raw read → English summary) pair it produces into a dataset on the /data
bucket; a couple hundred focused pairs later, I exported the JSONL straight from
the app's Model tab, LoRA-tuned Qwen3-1.7B on a free Colab T4 with Unsloth,
converted to GGUF, and published it to the Hub
(ranranrunforit/chan-compass-qwen3-1.7b-gguf). One line in MODEL_ZOO wires it
back in. The loop closes: the app generated its own training data, and now runs on
the model that data produced — a 1.7B doing one job well instead of a giant model
doing everything adequately.
What's in the box
- Tomorrow's plan — one row per ticker: BUY/SELL/HOLD/WAIT, explicit buy point, entry zone, invalidation price. Long-hold mode: ride the weekly pivot uplift, exit only on third-sell / structural stop / an armed nested-interval line.
- Sector rotation — flow proxy (Δ% × dollar volume) across the 11 SPDR sector ETFs vs SPY, 1/5/20-day windows, instant tables + on-demand AI narrative.
- Watchlist news — today-only headlines per holding, streamed as they arrive, each with an AI brief.
- Multi-agent Auto Research — PLAN → six evidence tools in parallel (fundamentals, financials, price, the Chan engine itself, money-flow proxy, news) → Analyst and Reporter writing different sections simultaneously: valuation · tech moat · supply-chain map with tickers · bull/bear · money flow · Chan timing · risks. Every run writes a full JSON trace — the plan, each tool call and its result, each sub-agent's request and response — published as a Hub dataset so anyone can see how the agent actually reasoned.
- Email any result — each tab can email its output; on HF (where SMTP is blocked) delivery goes over the Resend HTTPS API, markdown rendered to clean HTML.
- A real custom frontend — instead of the default Gradio component render,
server.pybuilds agradio.Serverthat serves a hand-built React + Adobe Spectrum 2 UI at/and exposes the Python backend as/api/*JSON + streaming endpoints. It's still a Gradio app — just wearing its own clothes.
Five things I learned the hard way
- HF build containers are tiny.
llama-cpp-pythongot OOM-killed compiling at build time on a Space with 32 GB of runtime RAM. Fix: install the runtime at first launch (prebuilt CPU wheel, capped-compile fallback), persisted to/dataso it happens once. Pinpython_version: 3.11— prebuilt wheels stop at 3.12. - Don't put the model on the hot path. My first "Run analysis" generated an AI narrative inline — 10 tickers took 186 s. Moving all LLM calls off the rule path and parallelizing downloads brought it to a few seconds. CPU inference is fine; it just can't be a synchronous dependency of everything.
- One lock per model. A shared lock turns concurrent features into "model busy." A sub-agent pool — one instance and lock per job — fixed it and made the fine-tune swappable for just the one agent that needed it.
- Platforms block SMTP; ship HTTPS email. Outbound mail ports are closed on
Spaces, so email failed with "network unreachable." The Resend REST API over
443 works — and it needs a
User-Agentheader or Cloudflare 403s you. - A custom frontend on a Gradio Space wants
gradio.Server, not.launch(). Mounting a React UI and callingapp.launch()made Gradio's SSR startup check hitgradio_api/startup-eventsand 404. The fix: treatgradio.Serveras the ASGI app it is and serve it with uvicorn (Docker SDK) — custom look, still a Gradio app.
What's next
The agent traces are already public as a Hub dataset; next I want to grow the fine-tune set so the Interpreter model keeps sharpening on exactly the phrasing this task needs, and feed real family usage back into that loop.
Not investment advice. Engine logic unchanged from the original; the app around it — and the model that explains it — is what this hackathon built.