Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /guides /deployment.md

VibecoderMcSwaggins

docs: update guides and add testing strategy documentation

b4aa4ad 20 days ago

preview code

raw

history blame

3.72 kB

	# Deployment Guide
	## Launching DeepCritical: Gradio, MCP, & Modal

	---

	## Overview

	DeepCritical is designed for a multi-platform deployment strategy to maximize hackathon impact:

	1. HuggingFace Spaces: Host the Gradio UI (User Interface).
	2. MCP Server: Expose research tools to Claude Desktop/Agents.
	3. Modal (Optional): Run heavy inference or local LLMs if API costs are prohibitive.

	---

	## 1. HuggingFace Spaces (Gradio UI)

	Goal: A public URL where judges/users can try the research agent.

	### Prerequisites
	- HuggingFace Account
	- `gradio` installed (`uv add gradio`)

	### Steps

	1. Create Space:
	- Go to HF Spaces -> Create New Space.
	- SDK: Gradio.
	- Hardware: CPU Basic (Free) is sufficient (since we use APIs).

	2. Prepare Files:
	- Ensure `app.py` contains the Gradio interface construction.
	- Ensure `requirements.txt` or `pyproject.toml` lists all dependencies.

	3. Secrets:
	- Go to Space Settings -> Repository secrets.
	- Add `ANTHROPIC_API_KEY` (or your chosen LLM provider key).
	- Add `BRAVE_API_KEY` (for web search).

	4. Deploy:
	- Push code to the Space's git repo.
	- Watch "Build" logs.

	### Streaming Optimization
	Ensure `app.py` uses generator functions for the chat interface to prevent timeouts:
	```python
	# app.py
	def predict(message, history):
	agent = ResearchAgent()
	for update in agent.research_stream(message):
	yield update
	```

	---

	## 2. MCP Server Deployment

	Goal: Allow other agents (like Claude Desktop) to use our PubMed/Research tools directly.

	### Local Usage (Claude Desktop)

	1. Install:
	```bash
	uv sync
	```

	2. Configure Claude Desktop:
	Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
	```json
	{
	"mcpServers": {
	"deepcritical": {
	"command": "uv",
	"args": ["run", "fastmcp", "run", "src/mcp_servers/pubmed_server.py"],
	"cwd": "/absolute/path/to/DeepCritical"
	}
	}
	}
	```

	3. Restart Claude: You should see a 🔌 icon indicating connected tools.

	### Remote Deployment (Smithery/Glama)
	Target for "MCP Track" bonus points.

	1. Dockerize: Create a `Dockerfile` for the MCP server.
	```dockerfile
	FROM python:3.11-slim
	COPY . /app
	RUN pip install fastmcp httpx
	CMD ["fastmcp", "run", "src/mcp_servers/pubmed_server.py", "--transport", "sse"]
	```
	Note: Use SSE transport for remote/HTTP servers.

	2. Deploy: Host on Fly.io or Railway.

	---

	## 3. Modal (GPU/Heavy Compute)

	Goal: Run a local LLM (e.g., Llama-3-70B) or handle massive parallel searches if APIs are too slow/expensive.

	### Setup
	1. Install: `uv add modal`
	2. Auth: `modal token new`

	### Logic
	Instead of calling Anthropic API, we call a Modal function:

	```python
	# src/llm/modal_client.py
	import modal

	stub = modal.Stub("deepcritical-inference")

	@stub.function(gpu="A100")
	def generate_text(prompt: str):
	# Load vLLM or similar
	...
	```

	### When to use?
	- Hackathon Demo: Stick to Anthropic/OpenAI APIs for speed/reliability.
	- Production/Stretch: Use Modal if you hit rate limits or want to show off "Open Source Models" capability.

	---

	## Deployment Checklist

	### Pre-Flight
	- [ ] Run `pytest -m unit` to ensure logic is sound.
	- [ ] Run `pytest -m e2e` (one pass) to verify APIs connect.
	- [ ] Check `requirements.txt` matches `pyproject.toml`.

	### Secrets Management
	- [ ] NEVER commit `.env` files.
	- [ ] Verify keys are added to HF Space settings.

	### Post-Launch
	- [ ] Test the live URL.
	- [ ] Verify "Stop" button in Gradio works (interrupts the agent).
	- [ ] Record a walkthrough video (crucial for hackathon submission).