DeepCritical / docs /guides /deployment.md
VibecoderMcSwaggins's picture
docs: update guides and add testing strategy documentation
b4aa4ad
|
raw
history blame
3.72 kB
# Deployment Guide
## Launching DeepCritical: Gradio, MCP, & Modal
---
## Overview
DeepCritical is designed for a multi-platform deployment strategy to maximize hackathon impact:
1. **HuggingFace Spaces**: Host the Gradio UI (User Interface).
2. **MCP Server**: Expose research tools to Claude Desktop/Agents.
3. **Modal (Optional)**: Run heavy inference or local LLMs if API costs are prohibitive.
---
## 1. HuggingFace Spaces (Gradio UI)
**Goal**: A public URL where judges/users can try the research agent.
### Prerequisites
- HuggingFace Account
- `gradio` installed (`uv add gradio`)
### Steps
1. **Create Space**:
- Go to HF Spaces -> Create New Space.
- SDK: **Gradio**.
- Hardware: **CPU Basic** (Free) is sufficient (since we use APIs).
2. **Prepare Files**:
- Ensure `app.py` contains the Gradio interface construction.
- Ensure `requirements.txt` or `pyproject.toml` lists all dependencies.
3. **Secrets**:
- Go to Space Settings -> **Repository secrets**.
- Add `ANTHROPIC_API_KEY` (or your chosen LLM provider key).
- Add `BRAVE_API_KEY` (for web search).
4. **Deploy**:
- Push code to the Space's git repo.
- Watch "Build" logs.
### Streaming Optimization
Ensure `app.py` uses generator functions for the chat interface to prevent timeouts:
```python
# app.py
def predict(message, history):
agent = ResearchAgent()
for update in agent.research_stream(message):
yield update
```
---
## 2. MCP Server Deployment
**Goal**: Allow other agents (like Claude Desktop) to use our PubMed/Research tools directly.
### Local Usage (Claude Desktop)
1. **Install**:
```bash
uv sync
```
2. **Configure Claude Desktop**:
Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"deepcritical": {
"command": "uv",
"args": ["run", "fastmcp", "run", "src/mcp_servers/pubmed_server.py"],
"cwd": "/absolute/path/to/DeepCritical"
}
}
}
```
3. **Restart Claude**: You should see a πŸ”Œ icon indicating connected tools.
### Remote Deployment (Smithery/Glama)
*Target for "MCP Track" bonus points.*
1. **Dockerize**: Create a `Dockerfile` for the MCP server.
```dockerfile
FROM python:3.11-slim
COPY . /app
RUN pip install fastmcp httpx
CMD ["fastmcp", "run", "src/mcp_servers/pubmed_server.py", "--transport", "sse"]
```
*Note: Use SSE transport for remote/HTTP servers.*
2. **Deploy**: Host on Fly.io or Railway.
---
## 3. Modal (GPU/Heavy Compute)
**Goal**: Run a local LLM (e.g., Llama-3-70B) or handle massive parallel searches if APIs are too slow/expensive.
### Setup
1. **Install**: `uv add modal`
2. **Auth**: `modal token new`
### Logic
Instead of calling Anthropic API, we call a Modal function:
```python
# src/llm/modal_client.py
import modal
stub = modal.Stub("deepcritical-inference")
@stub.function(gpu="A100")
def generate_text(prompt: str):
# Load vLLM or similar
...
```
### When to use?
- **Hackathon Demo**: Stick to Anthropic/OpenAI APIs for speed/reliability.
- **Production/Stretch**: Use Modal if you hit rate limits or want to show off "Open Source Models" capability.
---
## Deployment Checklist
### Pre-Flight
- [ ] Run `pytest -m unit` to ensure logic is sound.
- [ ] Run `pytest -m e2e` (one pass) to verify APIs connect.
- [ ] Check `requirements.txt` matches `pyproject.toml`.
### Secrets Management
- [ ] **NEVER** commit `.env` files.
- [ ] Verify keys are added to HF Space settings.
### Post-Launch
- [ ] Test the live URL.
- [ ] Verify "Stop" button in Gradio works (interrupts the agent).
- [ ] Record a walkthrough video (crucial for hackathon submission).