Spaces:
Running
Running
File size: 3,715 Bytes
b4aa4ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# Deployment Guide
## Launching DeepCritical: Gradio, MCP, & Modal
---
## Overview
DeepCritical is designed for a multi-platform deployment strategy to maximize hackathon impact:
1. **HuggingFace Spaces**: Host the Gradio UI (User Interface).
2. **MCP Server**: Expose research tools to Claude Desktop/Agents.
3. **Modal (Optional)**: Run heavy inference or local LLMs if API costs are prohibitive.
---
## 1. HuggingFace Spaces (Gradio UI)
**Goal**: A public URL where judges/users can try the research agent.
### Prerequisites
- HuggingFace Account
- `gradio` installed (`uv add gradio`)
### Steps
1. **Create Space**:
- Go to HF Spaces -> Create New Space.
- SDK: **Gradio**.
- Hardware: **CPU Basic** (Free) is sufficient (since we use APIs).
2. **Prepare Files**:
- Ensure `app.py` contains the Gradio interface construction.
- Ensure `requirements.txt` or `pyproject.toml` lists all dependencies.
3. **Secrets**:
- Go to Space Settings -> **Repository secrets**.
- Add `ANTHROPIC_API_KEY` (or your chosen LLM provider key).
- Add `BRAVE_API_KEY` (for web search).
4. **Deploy**:
- Push code to the Space's git repo.
- Watch "Build" logs.
### Streaming Optimization
Ensure `app.py` uses generator functions for the chat interface to prevent timeouts:
```python
# app.py
def predict(message, history):
agent = ResearchAgent()
for update in agent.research_stream(message):
yield update
```
---
## 2. MCP Server Deployment
**Goal**: Allow other agents (like Claude Desktop) to use our PubMed/Research tools directly.
### Local Usage (Claude Desktop)
1. **Install**:
```bash
uv sync
```
2. **Configure Claude Desktop**:
Edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"deepcritical": {
"command": "uv",
"args": ["run", "fastmcp", "run", "src/mcp_servers/pubmed_server.py"],
"cwd": "/absolute/path/to/DeepCritical"
}
}
}
```
3. **Restart Claude**: You should see a π icon indicating connected tools.
### Remote Deployment (Smithery/Glama)
*Target for "MCP Track" bonus points.*
1. **Dockerize**: Create a `Dockerfile` for the MCP server.
```dockerfile
FROM python:3.11-slim
COPY . /app
RUN pip install fastmcp httpx
CMD ["fastmcp", "run", "src/mcp_servers/pubmed_server.py", "--transport", "sse"]
```
*Note: Use SSE transport for remote/HTTP servers.*
2. **Deploy**: Host on Fly.io or Railway.
---
## 3. Modal (GPU/Heavy Compute)
**Goal**: Run a local LLM (e.g., Llama-3-70B) or handle massive parallel searches if APIs are too slow/expensive.
### Setup
1. **Install**: `uv add modal`
2. **Auth**: `modal token new`
### Logic
Instead of calling Anthropic API, we call a Modal function:
```python
# src/llm/modal_client.py
import modal
stub = modal.Stub("deepcritical-inference")
@stub.function(gpu="A100")
def generate_text(prompt: str):
# Load vLLM or similar
...
```
### When to use?
- **Hackathon Demo**: Stick to Anthropic/OpenAI APIs for speed/reliability.
- **Production/Stretch**: Use Modal if you hit rate limits or want to show off "Open Source Models" capability.
---
## Deployment Checklist
### Pre-Flight
- [ ] Run `pytest -m unit` to ensure logic is sound.
- [ ] Run `pytest -m e2e` (one pass) to verify APIs connect.
- [ ] Check `requirements.txt` matches `pyproject.toml`.
### Secrets Management
- [ ] **NEVER** commit `.env` files.
- [ ] Verify keys are added to HF Space settings.
### Post-Launch
- [ ] Test the live URL.
- [ ] Verify "Stop" button in Gradio works (interrupts the agent).
- [ ] Record a walkthrough video (crucial for hackathon submission).
|