VibecoderMcSwaggins commited on
Commit
5264b25
·
unverified ·
2 Parent(s): 8625ded f9cb2b7

Merge pull request #20 from The-Obstacle-Is-The-Way/feat/phase12-mcp-server

Browse files
README.md CHANGED
@@ -30,11 +30,35 @@ uv sync
30
 
31
  ```bash
32
  # Start the Gradio app
33
- uv run python -m src.app
34
  ```
35
 
36
  Open your browser to `http://localhost:7860`.
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ## Development
39
 
40
  ### Run Tests
@@ -53,13 +77,12 @@ make check
53
 
54
  DeepCritical uses a Vertical Slice Architecture:
55
 
56
- 1. **Search Slice**: Retrieving evidence from PubMed and the Web.
57
  2. **Judge Slice**: Evaluating evidence quality using LLMs.
58
  3. **Orchestrator Slice**: Managing the research loop and UI.
59
 
60
  Built with:
61
  - **PydanticAI**: For robust agent interactions.
62
  - **Gradio**: For the streaming user interface.
63
- - **PubMed**: For biomedical literature.
64
- - **DuckDuckGo**: For general web search.
65
-
 
30
 
31
  ```bash
32
  # Start the Gradio app
33
+ uv run python src/app.py
34
  ```
35
 
36
  Open your browser to `http://localhost:7860`.
37
 
38
+ ### 3. Connect via MCP
39
+
40
+ This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
41
+
42
+ **MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
43
+
44
+ **Claude Desktop Configuration**:
45
+ Add this to your `claude_desktop_config.json`:
46
+ ```json
47
+ {
48
+ "mcpServers": {
49
+ "deepcritical": {
50
+ "url": "http://localhost:7860/gradio_api/mcp/"
51
+ }
52
+ }
53
+ }
54
+ ```
55
+
56
+ **Available Tools**:
57
+ - `search_pubmed`: Search peer-reviewed biomedical literature.
58
+ - `search_clinical_trials`: Search ClinicalTrials.gov.
59
+ - `search_biorxiv`: Search bioRxiv/medRxiv preprints.
60
+ - `search_all`: Search all sources simultaneously.
61
+
62
  ## Development
63
 
64
  ### Run Tests
 
77
 
78
  DeepCritical uses a Vertical Slice Architecture:
79
 
80
+ 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
81
  2. **Judge Slice**: Evaluating evidence quality using LLMs.
82
  3. **Orchestrator Slice**: Managing the research loop and UI.
83
 
84
  Built with:
85
  - **PydanticAI**: For robust agent interactions.
86
  - **Gradio**: For the streaming user interface.
87
+ - **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
88
+ - **MCP**: For universal tool access.
 
docs/implementation/12_phase_mcp_server.md ADDED
@@ -0,0 +1,832 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 12 Implementation Spec: MCP Server Integration
2
+
3
+ **Goal**: Expose DeepCritical search tools as MCP servers for Track 2 compliance.
4
+ **Philosophy**: "MCP is the bridge between tools and LLMs."
5
+ **Prerequisite**: Phase 11 complete (all search tools working)
6
+ **Priority**: P0 - REQUIRED FOR HACKATHON TRACK 2
7
+ **Estimated Time**: 2-3 hours
8
+
9
+ ---
10
+
11
+ ## 1. Why MCP Server?
12
+
13
+ ### Hackathon Requirement
14
+
15
+ | Requirement | Status Before | Status After |
16
+ |-------------|---------------|--------------|
17
+ | Must use MCP servers as tools | **MISSING** | **COMPLIANT** |
18
+ | Autonomous Agent behavior | **Have it** | Have it |
19
+ | Must be Gradio app | **Have it** | Have it |
20
+ | Planning/reasoning/execution | **Have it** | Have it |
21
+
22
+ **Bottom Line**: Without MCP server, we're disqualified from Track 2.
23
+
24
+ ### What MCP Enables
25
+
26
+ ```text
27
+ Current State:
28
+ Our Tools → Called directly by Python code → Only our app can use them
29
+
30
+ After MCP:
31
+ Our Tools → Exposed via MCP protocol → Claude Desktop, Cursor, ANY MCP client
32
+ ```
33
+
34
+ ---
35
+
36
+ ## 2. Implementation Options Analysis
37
+
38
+ ### Option A: Gradio MCP (Recommended)
39
+
40
+ **Pros:**
41
+ - Single parameter: `demo.launch(mcp_server=True)`
42
+ - Already have Gradio app
43
+ - Automatic tool schema generation from docstrings
44
+ - Built into Gradio 5.0+
45
+
46
+ **Cons:**
47
+ - Requires Gradio 5.0+ with MCP extras
48
+ - Must follow strict docstring format
49
+
50
+ ### Option B: Native MCP SDK (FastMCP)
51
+
52
+ **Pros:**
53
+ - More control over tool definitions
54
+ - Explicit server configuration
55
+ - Separate from UI concerns
56
+
57
+ **Cons:**
58
+ - Separate server process
59
+ - More code to maintain
60
+ - Additional dependency
61
+
62
+ ### Decision: **Gradio MCP (Option A)**
63
+
64
+ Rationale:
65
+ 1. Already have Gradio app (`src/app.py`)
66
+ 2. Minimal code changes
67
+ 3. Judges will appreciate simplicity
68
+ 4. Follows hackathon's official Gradio guide
69
+
70
+ ---
71
+
72
+ ## 3. Technical Specification
73
+
74
+ ### 3.1 Dependencies
75
+
76
+ ```toml
77
+ # pyproject.toml - add MCP extras
78
+ dependencies = [
79
+ "gradio[mcp]>=5.0.0", # Updated from gradio>=4.0
80
+ # ... existing deps
81
+ ]
82
+ ```
83
+
84
+ ### 3.2 MCP Tool Functions
85
+
86
+ Each tool needs:
87
+ 1. **Type hints** on all parameters
88
+ 2. **Docstring** with Args section (Google style)
89
+ 3. **Return type** annotation
90
+ 4. **`api_name`** parameter for explicit endpoint naming
91
+
92
+ ```python
93
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
94
+ """Search PubMed for biomedical literature.
95
+
96
+ Args:
97
+ query: Search query for PubMed (e.g., "metformin alzheimer")
98
+ max_results: Maximum number of results to return (1-50)
99
+
100
+ Returns:
101
+ Formatted search results with titles, citations, and abstracts
102
+ """
103
+ ```
104
+
105
+ ### 3.3 MCP Server URL
106
+
107
+ Once launched:
108
+ ```text
109
+ http://localhost:7860/gradio_api/mcp/
110
+ ```
111
+
112
+ Or on HuggingFace Spaces:
113
+ ```text
114
+ https://[space-id].hf.space/gradio_api/mcp/
115
+ ```
116
+
117
+ ---
118
+
119
+ ## 4. Implementation
120
+
121
+ ### 4.1 MCP Tool Wrappers (`src/mcp_tools.py`)
122
+
123
+ ```python
124
+ """MCP tool wrappers for DeepCritical search tools.
125
+
126
+ These functions expose our search tools via MCP protocol.
127
+ Each function follows the MCP tool contract:
128
+ - Full type hints
129
+ - Google-style docstrings with Args section
130
+ - Formatted string returns
131
+ """
132
+
133
+ from src.tools.biorxiv import BioRxivTool
134
+ from src.tools.clinicaltrials import ClinicalTrialsTool
135
+ from src.tools.pubmed import PubMedTool
136
+
137
+
138
+ # Singleton instances (avoid recreating on each call)
139
+ _pubmed = PubMedTool()
140
+ _trials = ClinicalTrialsTool()
141
+ _biorxiv = BioRxivTool()
142
+
143
+
144
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
145
+ """Search PubMed for peer-reviewed biomedical literature.
146
+
147
+ Searches NCBI PubMed database for scientific papers matching your query.
148
+ Returns titles, authors, abstracts, and citation information.
149
+
150
+ Args:
151
+ query: Search query (e.g., "metformin alzheimer", "drug repurposing cancer")
152
+ max_results: Maximum results to return (1-50, default 10)
153
+
154
+ Returns:
155
+ Formatted search results with paper titles, authors, dates, and abstracts
156
+ """
157
+ max_results = max(1, min(50, max_results)) # Clamp to valid range
158
+
159
+ results = await _pubmed.search(query, max_results)
160
+
161
+ if not results:
162
+ return f"No PubMed results found for: {query}"
163
+
164
+ formatted = [f"## PubMed Results for: {query}\n"]
165
+ for i, evidence in enumerate(results, 1):
166
+ formatted.append(f"### {i}. {evidence.citation.title}")
167
+ formatted.append(f"**Authors**: {', '.join(evidence.citation.authors[:3])}")
168
+ formatted.append(f"**Date**: {evidence.citation.date}")
169
+ formatted.append(f"**URL**: {evidence.citation.url}")
170
+ formatted.append(f"\n{evidence.content}\n")
171
+
172
+ return "\n".join(formatted)
173
+
174
+
175
+ async def search_clinical_trials(query: str, max_results: int = 10) -> str:
176
+ """Search ClinicalTrials.gov for clinical trial data.
177
+
178
+ Searches the ClinicalTrials.gov database for trials matching your query.
179
+ Returns trial titles, phases, status, conditions, and interventions.
180
+
181
+ Args:
182
+ query: Search query (e.g., "metformin alzheimer", "diabetes phase 3")
183
+ max_results: Maximum results to return (1-50, default 10)
184
+
185
+ Returns:
186
+ Formatted clinical trial information with NCT IDs, phases, and status
187
+ """
188
+ max_results = max(1, min(50, max_results))
189
+
190
+ results = await _trials.search(query, max_results)
191
+
192
+ if not results:
193
+ return f"No clinical trials found for: {query}"
194
+
195
+ formatted = [f"## Clinical Trials for: {query}\n"]
196
+ for i, evidence in enumerate(results, 1):
197
+ formatted.append(f"### {i}. {evidence.citation.title}")
198
+ formatted.append(f"**URL**: {evidence.citation.url}")
199
+ formatted.append(f"**Date**: {evidence.citation.date}")
200
+ formatted.append(f"\n{evidence.content}\n")
201
+
202
+ return "\n".join(formatted)
203
+
204
+
205
+ async def search_biorxiv(query: str, max_results: int = 10) -> str:
206
+ """Search bioRxiv/medRxiv for preprint research.
207
+
208
+ Searches bioRxiv and medRxiv preprint servers for cutting-edge research.
209
+ Note: Preprints are NOT peer-reviewed but contain the latest findings.
210
+
211
+ Args:
212
+ query: Search query (e.g., "metformin neuroprotection", "long covid treatment")
213
+ max_results: Maximum results to return (1-50, default 10)
214
+
215
+ Returns:
216
+ Formatted preprint results with titles, authors, and abstracts
217
+ """
218
+ max_results = max(1, min(50, max_results))
219
+
220
+ results = await _biorxiv.search(query, max_results)
221
+
222
+ if not results:
223
+ return f"No bioRxiv/medRxiv preprints found for: {query}"
224
+
225
+ formatted = [f"## Preprint Results for: {query}\n"]
226
+ for i, evidence in enumerate(results, 1):
227
+ formatted.append(f"### {i}. {evidence.citation.title}")
228
+ formatted.append(f"**Authors**: {', '.join(evidence.citation.authors[:3])}")
229
+ formatted.append(f"**Date**: {evidence.citation.date}")
230
+ formatted.append(f"**URL**: {evidence.citation.url}")
231
+ formatted.append(f"\n{evidence.content}\n")
232
+
233
+ return "\n".join(formatted)
234
+
235
+
236
+ async def search_all_sources(query: str, max_per_source: int = 5) -> str:
237
+ """Search all biomedical sources simultaneously.
238
+
239
+ Performs parallel search across PubMed, ClinicalTrials.gov, and bioRxiv.
240
+ This is the most comprehensive search option for drug repurposing research.
241
+
242
+ Args:
243
+ query: Search query (e.g., "metformin alzheimer", "aspirin cancer prevention")
244
+ max_per_source: Maximum results per source (1-20, default 5)
245
+
246
+ Returns:
247
+ Combined results from all sources with source labels
248
+ """
249
+ import asyncio
250
+
251
+ max_per_source = max(1, min(20, max_per_source))
252
+
253
+ # Run all searches in parallel
254
+ pubmed_task = search_pubmed(query, max_per_source)
255
+ trials_task = search_clinical_trials(query, max_per_source)
256
+ biorxiv_task = search_biorxiv(query, max_per_source)
257
+
258
+ pubmed_results, trials_results, biorxiv_results = await asyncio.gather(
259
+ pubmed_task, trials_task, biorxiv_task, return_exceptions=True
260
+ )
261
+
262
+ formatted = [f"# Comprehensive Search: {query}\n"]
263
+
264
+ # Add each result section (handle exceptions gracefully)
265
+ if isinstance(pubmed_results, str):
266
+ formatted.append(pubmed_results)
267
+ else:
268
+ formatted.append(f"## PubMed\n*Error: {pubmed_results}*\n")
269
+
270
+ if isinstance(trials_results, str):
271
+ formatted.append(trials_results)
272
+ else:
273
+ formatted.append(f"## Clinical Trials\n*Error: {trials_results}*\n")
274
+
275
+ if isinstance(biorxiv_results, str):
276
+ formatted.append(biorxiv_results)
277
+ else:
278
+ formatted.append(f"## Preprints\n*Error: {biorxiv_results}*\n")
279
+
280
+ return "\n---\n".join(formatted)
281
+ ```
282
+
283
+ ### 4.2 Update Gradio App (`src/app.py`)
284
+
285
+ ```python
286
+ """Gradio UI for DeepCritical agent with MCP server support."""
287
+
288
+ import os
289
+ from collections.abc import AsyncGenerator
290
+ from typing import Any
291
+
292
+ import gradio as gr
293
+
294
+ from src.agent_factory.judges import JudgeHandler, MockJudgeHandler
295
+ from src.mcp_tools import (
296
+ search_all_sources,
297
+ search_biorxiv,
298
+ search_clinical_trials,
299
+ search_pubmed,
300
+ )
301
+ from src.orchestrator_factory import create_orchestrator
302
+ from src.tools.biorxiv import BioRxivTool
303
+ from src.tools.clinicaltrials import ClinicalTrialsTool
304
+ from src.tools.pubmed import PubMedTool
305
+ from src.tools.search_handler import SearchHandler
306
+ from src.utils.models import OrchestratorConfig
307
+
308
+
309
+ # ... (existing configure_orchestrator and research_agent functions unchanged)
310
+
311
+
312
+ def create_demo() -> Any:
313
+ """
314
+ Create the Gradio demo interface with MCP support.
315
+
316
+ Returns:
317
+ Configured Gradio Blocks interface with MCP server enabled
318
+ """
319
+ with gr.Blocks(
320
+ title="DeepCritical - Drug Repurposing Research Agent",
321
+ theme=gr.themes.Soft(),
322
+ ) as demo:
323
+ gr.Markdown("""
324
+ # DeepCritical
325
+ ## AI-Powered Drug Repurposing Research Agent
326
+
327
+ Ask questions about potential drug repurposing opportunities.
328
+ The agent searches PubMed, ClinicalTrials.gov, and bioRxiv/medRxiv preprints.
329
+
330
+ **Example questions:**
331
+ - "What drugs could be repurposed for Alzheimer's disease?"
332
+ - "Is metformin effective for cancer treatment?"
333
+ - "What existing medications show promise for Long COVID?"
334
+ """)
335
+
336
+ # Main chat interface (existing)
337
+ gr.ChatInterface(
338
+ fn=research_agent,
339
+ type="messages",
340
+ title="",
341
+ examples=[
342
+ "What drugs could be repurposed for Alzheimer's disease?",
343
+ "Is metformin effective for treating cancer?",
344
+ "What medications show promise for Long COVID treatment?",
345
+ "Can statins be repurposed for neurological conditions?",
346
+ ],
347
+ additional_inputs=[
348
+ gr.Radio(
349
+ choices=["simple", "magentic"],
350
+ value="simple",
351
+ label="Orchestrator Mode",
352
+ info="Simple: Linear (OpenAI/Anthropic) | Magentic: Multi-Agent (OpenAI)",
353
+ )
354
+ ],
355
+ )
356
+
357
+ # MCP Tool Interfaces (exposed via MCP protocol)
358
+ gr.Markdown("---\n## MCP Tools (Also Available via Claude Desktop)")
359
+
360
+ with gr.Tab("PubMed Search"):
361
+ gr.Interface(
362
+ fn=search_pubmed,
363
+ inputs=[
364
+ gr.Textbox(label="Query", placeholder="metformin alzheimer"),
365
+ gr.Slider(1, 50, value=10, step=1, label="Max Results"),
366
+ ],
367
+ outputs=gr.Markdown(label="Results"),
368
+ api_name="search_pubmed",
369
+ )
370
+
371
+ with gr.Tab("Clinical Trials"):
372
+ gr.Interface(
373
+ fn=search_clinical_trials,
374
+ inputs=[
375
+ gr.Textbox(label="Query", placeholder="diabetes phase 3"),
376
+ gr.Slider(1, 50, value=10, step=1, label="Max Results"),
377
+ ],
378
+ outputs=gr.Markdown(label="Results"),
379
+ api_name="search_clinical_trials",
380
+ )
381
+
382
+ with gr.Tab("Preprints"):
383
+ gr.Interface(
384
+ fn=search_biorxiv,
385
+ inputs=[
386
+ gr.Textbox(label="Query", placeholder="long covid treatment"),
387
+ gr.Slider(1, 50, value=10, step=1, label="Max Results"),
388
+ ],
389
+ outputs=gr.Markdown(label="Results"),
390
+ api_name="search_biorxiv",
391
+ )
392
+
393
+ with gr.Tab("Search All"):
394
+ gr.Interface(
395
+ fn=search_all_sources,
396
+ inputs=[
397
+ gr.Textbox(label="Query", placeholder="metformin cancer"),
398
+ gr.Slider(1, 20, value=5, step=1, label="Max Per Source"),
399
+ ],
400
+ outputs=gr.Markdown(label="Results"),
401
+ api_name="search_all",
402
+ )
403
+
404
+ gr.Markdown("""
405
+ ---
406
+ **Note**: This is a research tool and should not be used for medical decisions.
407
+ Always consult healthcare professionals for medical advice.
408
+
409
+ Built with PydanticAI + PubMed, ClinicalTrials.gov & bioRxiv
410
+
411
+ **MCP Server**: Available at `/gradio_api/mcp/` for Claude Desktop integration
412
+ """)
413
+
414
+ return demo
415
+
416
+
417
+ def main() -> None:
418
+ """Run the Gradio app with MCP server enabled."""
419
+ demo = create_demo()
420
+ demo.launch(
421
+ server_name="0.0.0.0",
422
+ server_port=7860,
423
+ share=False,
424
+ mcp_server=True, # Enable MCP server
425
+ )
426
+
427
+
428
+ if __name__ == "__main__":
429
+ main()
430
+ ```
431
+
432
+ ---
433
+
434
+ ## 5. TDD Test Suite
435
+
436
+ ### 5.1 Unit Tests (`tests/unit/test_mcp_tools.py`)
437
+
438
+ ```python
439
+ """Unit tests for MCP tool wrappers."""
440
+
441
+ from unittest.mock import AsyncMock, patch
442
+
443
+ import pytest
444
+
445
+ from src.mcp_tools import (
446
+ search_all_sources,
447
+ search_biorxiv,
448
+ search_clinical_trials,
449
+ search_pubmed,
450
+ )
451
+ from src.utils.models import Citation, Evidence
452
+
453
+
454
+ @pytest.fixture
455
+ def mock_evidence() -> Evidence:
456
+ """Sample evidence for testing."""
457
+ return Evidence(
458
+ content="Metformin shows neuroprotective effects in preclinical models.",
459
+ citation=Citation(
460
+ source="pubmed",
461
+ title="Metformin and Alzheimer's Disease",
462
+ url="https://pubmed.ncbi.nlm.nih.gov/12345678/",
463
+ date="2024-01-15",
464
+ authors=["Smith J", "Jones M", "Brown K"],
465
+ ),
466
+ relevance=0.85,
467
+ )
468
+
469
+
470
+ class TestSearchPubMed:
471
+ """Tests for search_pubmed MCP tool."""
472
+
473
+ @pytest.mark.asyncio
474
+ async def test_returns_formatted_string(self, mock_evidence: Evidence) -> None:
475
+ """Should return formatted markdown string."""
476
+ with patch("src.mcp_tools._pubmed") as mock_tool:
477
+ mock_tool.search = AsyncMock(return_value=[mock_evidence])
478
+
479
+ result = await search_pubmed("metformin alzheimer", 10)
480
+
481
+ assert isinstance(result, str)
482
+ assert "PubMed Results" in result
483
+ assert "Metformin and Alzheimer's Disease" in result
484
+ assert "Smith J" in result
485
+
486
+ @pytest.mark.asyncio
487
+ async def test_clamps_max_results(self) -> None:
488
+ """Should clamp max_results to valid range (1-50)."""
489
+ with patch("src.mcp_tools._pubmed") as mock_tool:
490
+ mock_tool.search = AsyncMock(return_value=[])
491
+
492
+ # Test lower bound
493
+ await search_pubmed("test", 0)
494
+ mock_tool.search.assert_called_with("test", 1)
495
+
496
+ # Test upper bound
497
+ await search_pubmed("test", 100)
498
+ mock_tool.search.assert_called_with("test", 50)
499
+
500
+ @pytest.mark.asyncio
501
+ async def test_handles_no_results(self) -> None:
502
+ """Should return appropriate message when no results."""
503
+ with patch("src.mcp_tools._pubmed") as mock_tool:
504
+ mock_tool.search = AsyncMock(return_value=[])
505
+
506
+ result = await search_pubmed("xyznonexistent", 10)
507
+
508
+ assert "No PubMed results found" in result
509
+
510
+
511
+ class TestSearchClinicalTrials:
512
+ """Tests for search_clinical_trials MCP tool."""
513
+
514
+ @pytest.mark.asyncio
515
+ async def test_returns_formatted_string(self, mock_evidence: Evidence) -> None:
516
+ """Should return formatted markdown string."""
517
+ mock_evidence.citation.source = "clinicaltrials" # type: ignore
518
+
519
+ with patch("src.mcp_tools._trials") as mock_tool:
520
+ mock_tool.search = AsyncMock(return_value=[mock_evidence])
521
+
522
+ result = await search_clinical_trials("diabetes", 10)
523
+
524
+ assert isinstance(result, str)
525
+ assert "Clinical Trials" in result
526
+
527
+
528
+ class TestSearchBiorxiv:
529
+ """Tests for search_biorxiv MCP tool."""
530
+
531
+ @pytest.mark.asyncio
532
+ async def test_returns_formatted_string(self, mock_evidence: Evidence) -> None:
533
+ """Should return formatted markdown string."""
534
+ mock_evidence.citation.source = "biorxiv" # type: ignore
535
+
536
+ with patch("src.mcp_tools._biorxiv") as mock_tool:
537
+ mock_tool.search = AsyncMock(return_value=[mock_evidence])
538
+
539
+ result = await search_biorxiv("preprint search", 10)
540
+
541
+ assert isinstance(result, str)
542
+ assert "Preprint Results" in result
543
+
544
+
545
+ class TestSearchAllSources:
546
+ """Tests for search_all_sources MCP tool."""
547
+
548
+ @pytest.mark.asyncio
549
+ async def test_combines_all_sources(self, mock_evidence: Evidence) -> None:
550
+ """Should combine results from all sources."""
551
+ with patch("src.mcp_tools.search_pubmed", new_callable=AsyncMock) as mock_pubmed, \
552
+ patch("src.mcp_tools.search_clinical_trials", new_callable=AsyncMock) as mock_trials, \
553
+ patch("src.mcp_tools.search_biorxiv", new_callable=AsyncMock) as mock_biorxiv:
554
+
555
+ mock_pubmed.return_value = "## PubMed Results"
556
+ mock_trials.return_value = "## Clinical Trials"
557
+ mock_biorxiv.return_value = "## Preprints"
558
+
559
+ result = await search_all_sources("metformin", 5)
560
+
561
+ assert "Comprehensive Search" in result
562
+ assert "PubMed" in result
563
+ assert "Clinical Trials" in result
564
+ assert "Preprints" in result
565
+
566
+ @pytest.mark.asyncio
567
+ async def test_handles_partial_failures(self) -> None:
568
+ """Should handle partial failures gracefully."""
569
+ with patch("src.mcp_tools.search_pubmed", new_callable=AsyncMock) as mock_pubmed, \
570
+ patch("src.mcp_tools.search_clinical_trials", new_callable=AsyncMock) as mock_trials, \
571
+ patch("src.mcp_tools.search_biorxiv", new_callable=AsyncMock) as mock_biorxiv:
572
+
573
+ mock_pubmed.return_value = "## PubMed Results"
574
+ mock_trials.side_effect = Exception("API Error")
575
+ mock_biorxiv.return_value = "## Preprints"
576
+
577
+ result = await search_all_sources("metformin", 5)
578
+
579
+ # Should still contain working sources
580
+ assert "PubMed" in result
581
+ assert "Preprints" in result
582
+ # Should show error for failed source
583
+ assert "Error" in result
584
+
585
+
586
+ class TestMCPDocstrings:
587
+ """Tests that docstrings follow MCP format."""
588
+
589
+ def test_search_pubmed_has_args_section(self) -> None:
590
+ """Docstring must have Args section for MCP schema generation."""
591
+ assert search_pubmed.__doc__ is not None
592
+ assert "Args:" in search_pubmed.__doc__
593
+ assert "query:" in search_pubmed.__doc__
594
+ assert "max_results:" in search_pubmed.__doc__
595
+ assert "Returns:" in search_pubmed.__doc__
596
+
597
+ def test_search_clinical_trials_has_args_section(self) -> None:
598
+ """Docstring must have Args section for MCP schema generation."""
599
+ assert search_clinical_trials.__doc__ is not None
600
+ assert "Args:" in search_clinical_trials.__doc__
601
+
602
+ def test_search_biorxiv_has_args_section(self) -> None:
603
+ """Docstring must have Args section for MCP schema generation."""
604
+ assert search_biorxiv.__doc__ is not None
605
+ assert "Args:" in search_biorxiv.__doc__
606
+
607
+ def test_search_all_sources_has_args_section(self) -> None:
608
+ """Docstring must have Args section for MCP schema generation."""
609
+ assert search_all_sources.__doc__ is not None
610
+ assert "Args:" in search_all_sources.__doc__
611
+
612
+
613
+ class TestMCPTypeHints:
614
+ """Tests that type hints are complete for MCP."""
615
+
616
+ def test_search_pubmed_type_hints(self) -> None:
617
+ """All parameters and return must have type hints."""
618
+ import inspect
619
+
620
+ sig = inspect.signature(search_pubmed)
621
+
622
+ # Check parameter hints
623
+ assert sig.parameters["query"].annotation == str
624
+ assert sig.parameters["max_results"].annotation == int
625
+
626
+ # Check return hint
627
+ assert sig.return_annotation == str
628
+
629
+ def test_search_clinical_trials_type_hints(self) -> None:
630
+ """All parameters and return must have type hints."""
631
+ import inspect
632
+
633
+ sig = inspect.signature(search_clinical_trials)
634
+ assert sig.parameters["query"].annotation == str
635
+ assert sig.parameters["max_results"].annotation == int
636
+ assert sig.return_annotation == str
637
+ ```
638
+
639
+ ### 5.2 Integration Test (`tests/integration/test_mcp_server.py`)
640
+
641
+ ```python
642
+ """Integration tests for MCP server functionality."""
643
+
644
+ import pytest
645
+
646
+
647
+ class TestMCPServerIntegration:
648
+ """Integration tests for MCP server (requires running app)."""
649
+
650
+ @pytest.mark.integration
651
+ @pytest.mark.asyncio
652
+ async def test_mcp_tools_work_end_to_end(self) -> None:
653
+ """Test that MCP tools execute real searches."""
654
+ from src.mcp_tools import search_pubmed
655
+
656
+ result = await search_pubmed("metformin diabetes", 3)
657
+
658
+ assert isinstance(result, str)
659
+ assert "PubMed Results" in result
660
+ # Should have actual content (not just "no results")
661
+ assert len(result) > 100
662
+ ```
663
+
664
+ ---
665
+
666
+ ## 6. Claude Desktop Configuration
667
+
668
+ ### 6.1 Local Development
669
+
670
+ ```json
671
+ // ~/.config/claude/claude_desktop_config.json (Linux/Mac)
672
+ // %APPDATA%\Claude\claude_desktop_config.json (Windows)
673
+ {
674
+ "mcpServers": {
675
+ "deepcritical": {
676
+ "url": "http://localhost:7860/gradio_api/mcp/"
677
+ }
678
+ }
679
+ }
680
+ ```
681
+
682
+ ### 6.2 HuggingFace Spaces
683
+
684
+ ```json
685
+ {
686
+ "mcpServers": {
687
+ "deepcritical": {
688
+ "url": "https://MCP-1st-Birthday-deepcritical.hf.space/gradio_api/mcp/"
689
+ }
690
+ }
691
+ }
692
+ ```
693
+
694
+ ### 6.3 Private Spaces (with auth)
695
+
696
+ ```json
697
+ {
698
+ "mcpServers": {
699
+ "deepcritical": {
700
+ "url": "https://your-space.hf.space/gradio_api/mcp/",
701
+ "headers": {
702
+ "Authorization": "Bearer hf_xxxxxxxxxxxxx"
703
+ }
704
+ }
705
+ }
706
+ }
707
+ ```
708
+
709
+ ---
710
+
711
+ ## 7. Verification Commands
712
+
713
+ ```bash
714
+ # 1. Install MCP extras
715
+ uv add "gradio[mcp]>=5.0.0"
716
+
717
+ # 2. Run unit tests
718
+ uv run pytest tests/unit/test_mcp_tools.py -v
719
+
720
+ # 3. Run full test suite
721
+ make check
722
+
723
+ # 4. Start server with MCP
724
+ uv run python src/app.py
725
+
726
+ # 5. Verify MCP schema (in another terminal)
727
+ curl http://localhost:7860/gradio_api/mcp/schema
728
+
729
+ # 6. Test with MCP Inspector
730
+ npx @anthropic/mcp-inspector http://localhost:7860/gradio_api/mcp/
731
+
732
+ # 7. Integration test (requires running server)
733
+ uv run pytest tests/integration/test_mcp_server.py -v -m integration
734
+ ```
735
+
736
+ ---
737
+
738
+ ## 8. Definition of Done
739
+
740
+ Phase 12 is **COMPLETE** when:
741
+
742
+ - [ ] `src/mcp_tools.py` created with all 4 MCP tools
743
+ - [ ] `src/app.py` updated with `mcp_server=True`
744
+ - [ ] Unit tests in `tests/unit/test_mcp_tools.py`
745
+ - [ ] Integration test in `tests/integration/test_mcp_server.py`
746
+ - [ ] `pyproject.toml` updated with `gradio[mcp]`
747
+ - [ ] MCP schema accessible at `/gradio_api/mcp/schema`
748
+ - [ ] Claude Desktop can connect and use tools
749
+ - [ ] All unit tests pass
750
+ - [ ] Lints pass
751
+
752
+ ---
753
+
754
+ ## 9. Demo Script for Judges
755
+
756
+ ### Show MCP Integration Works
757
+
758
+ 1. **Start the server**:
759
+ ```bash
760
+ uv run python src/app.py
761
+ ```
762
+
763
+ 2. **Show Claude Desktop using our tools**:
764
+ - Open Claude Desktop with DeepCritical MCP configured
765
+ - Ask: "Search PubMed for metformin Alzheimer's"
766
+ - Show real results appearing
767
+ - Ask: "Now search clinical trials for the same"
768
+ - Show combined analysis
769
+
770
+ 3. **Show MCP Inspector**:
771
+ ```bash
772
+ npx @anthropic/mcp-inspector http://localhost:7860/gradio_api/mcp/
773
+ ```
774
+ - Show all 4 tools listed
775
+ - Execute `search_pubmed` from inspector
776
+ - Show results
777
+
778
+ ---
779
+
780
+ ## 10. Value Delivered
781
+
782
+ | Before | After |
783
+ |--------|-------|
784
+ | Tools only usable in our app | Tools usable by ANY MCP client |
785
+ | Not Track 2 compliant | **FULLY TRACK 2 COMPLIANT** |
786
+ | Can't use with Claude Desktop | Full Claude Desktop integration |
787
+
788
+ **Prize Impact**:
789
+ - Without MCP: **Disqualified from Track 2**
790
+ - With MCP: **Eligible for $2,500 1st place**
791
+
792
+ ---
793
+
794
+ ## 11. Files to Create/Modify
795
+
796
+ | File | Action | Purpose |
797
+ |------|--------|---------|
798
+ | `src/mcp_tools.py` | CREATE | MCP tool wrapper functions |
799
+ | `src/app.py` | MODIFY | Add `mcp_server=True`, add tool tabs |
800
+ | `pyproject.toml` | MODIFY | Add `gradio[mcp]>=5.0.0` |
801
+ | `tests/unit/test_mcp_tools.py` | CREATE | Unit tests for MCP tools |
802
+ | `tests/integration/test_mcp_server.py` | CREATE | Integration tests |
803
+ | `README.md` | MODIFY | Add MCP usage instructions |
804
+
805
+ ---
806
+
807
+ ## 12. Architecture After Phase 12
808
+
809
+ ```text
810
+ ┌────────────────────────────────────────────────────────────────┐
811
+ │ Claude Desktop / Cursor │
812
+ │ (MCP Client) │
813
+ └─────────────────────────────┬──────────────────────────────────┘
814
+ │ MCP Protocol
815
+
816
+ ┌─────────────────────────────────────────────────────────────────┐
817
+ │ Gradio MCP Server │
818
+ │ /gradio_api/mcp/ │
819
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ │
820
+ │ │search_pubmed │ │search_trials │ │search_biorxiv│ │search_ │ │
821
+ │ │ │ │ │ │ │ │all │ │
822
+ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └────┬────┘ │
823
+ └─────────┼────────────────┼────────────────┼──────────────┼──────┘
824
+ │ │ │ │
825
+ ▼ ▼ ▼ ▼
826
+ ┌──────────┐ ┌──────────┐ ┌──────────┐ (calls all)
827
+ │PubMedTool│ │Trials │ │BioRxiv │
828
+ │ │ │Tool │ │Tool │
829
+ └──────────┘ └──────────┘ └──────────┘
830
+ ```
831
+
832
+ **This is the MCP compliance stack.**
docs/implementation/13_phase_modal_integration.md ADDED
@@ -0,0 +1,1195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 13 Implementation Spec: Modal Pipeline Integration
2
+
3
+ **Goal**: Wire existing Modal code execution into the agent pipeline.
4
+ **Philosophy**: "Sandboxed execution makes AI-generated code trustworthy."
5
+ **Prerequisite**: Phase 12 complete (MCP server working)
6
+ **Priority**: P1 - HIGH VALUE ($2,500 Modal Innovation Award)
7
+ **Estimated Time**: 2-3 hours
8
+
9
+ ---
10
+
11
+ ## 1. Why Modal Integration?
12
+
13
+ ### Current State Analysis
14
+
15
+ Mario already implemented `src/tools/code_execution.py`:
16
+
17
+ | Component | Status | Notes |
18
+ |-----------|--------|-------|
19
+ | `ModalCodeExecutor` class | Built | Executes Python in Modal sandbox |
20
+ | `SANDBOX_LIBRARIES` | Defined | pandas, numpy, scipy, etc. |
21
+ | `execute()` method | Implemented | Stdout/stderr capture |
22
+ | `execute_with_return()` | Implemented | Returns `result` variable |
23
+ | `AnalysisAgent` | Built | Uses Modal for statistical analysis |
24
+ | **Pipeline Integration** | **MISSING** | Not wired into main orchestrator |
25
+
26
+ ### What's Missing
27
+
28
+ ```text
29
+ Current Flow:
30
+ User Query → Orchestrator → Search → Judge → [Report] → Done
31
+
32
+ With Modal:
33
+ User Query → Orchestrator → Search → Judge → [Analysis*] → Report → Done
34
+
35
+ Modal Sandbox Execution
36
+ ```
37
+
38
+ *The AnalysisAgent exists but is NOT called by either orchestrator.
39
+
40
+ ---
41
+
42
+ ## 2. Critical Dependency Analysis
43
+
44
+ ### The Problem (Senior Feedback)
45
+
46
+ ```python
47
+ # src/agents/analysis_agent.py - Line 8
48
+ from agent_framework import (
49
+ AgentRunResponse,
50
+ BaseAgent,
51
+ ...
52
+ )
53
+ ```
54
+
55
+ ```toml
56
+ # pyproject.toml - agent-framework is OPTIONAL
57
+ [project.optional-dependencies]
58
+ magentic = [
59
+ "agent-framework-core",
60
+ ]
61
+ ```
62
+
63
+ **If we import `AnalysisAgent` in the simple orchestrator without the `magentic` extra installed, the app CRASHES on startup.**
64
+
65
+ ### The SOLID Solution
66
+
67
+ **Single Responsibility Principle**: Decouple Modal execution logic from `agent_framework`.
68
+
69
+ ```text
70
+ BEFORE (Coupled):
71
+ AnalysisAgent (requires agent_framework)
72
+
73
+ ModalCodeExecutor
74
+
75
+ AFTER (Decoupled):
76
+ StatisticalAnalyzer (no agent_framework dependency) ← Simple mode uses this
77
+
78
+ ModalCodeExecutor
79
+
80
+ AnalysisAgent (wraps StatisticalAnalyzer) ← Magentic mode uses this
81
+ ```
82
+
83
+ **Key insight**: Create `src/services/statistical_analyzer.py` with ZERO agent_framework imports.
84
+
85
+ ---
86
+
87
+ ## 3. Prize Opportunity
88
+
89
+ ### Modal Innovation Award: $2,500
90
+
91
+ **Judging Criteria**:
92
+ 1. **Sandbox Isolation** - Code runs in container, not local
93
+ 2. **Scientific Computing** - Real pandas/scipy analysis
94
+ 3. **Safety** - Can't access local filesystem
95
+ 4. **Speed** - Modal's fast cold starts
96
+
97
+ ### What We Need to Show
98
+
99
+ ```python
100
+ # LLM generates analysis code
101
+ code = """
102
+ import pandas as pd
103
+ import scipy.stats as stats
104
+
105
+ data = pd.DataFrame({
106
+ 'study': ['Study1', 'Study2', 'Study3'],
107
+ 'effect_size': [0.45, 0.52, 0.38],
108
+ 'sample_size': [120, 85, 200]
109
+ })
110
+
111
+ weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum()
112
+ t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0)
113
+
114
+ print(f"Weighted Effect Size: {weighted_mean:.3f}")
115
+ print(f"P-value: {p_value:.4f}")
116
+
117
+ result = "SUPPORTED" if p_value < 0.05 else "INCONCLUSIVE"
118
+ """
119
+
120
+ # Executed SAFELY in Modal sandbox
121
+ executor = get_code_executor()
122
+ output = executor.execute(code) # Runs in isolated container!
123
+ ```
124
+
125
+ ---
126
+
127
+ ## 4. Technical Specification
128
+
129
+ ### 4.1 Dependencies
130
+
131
+ ```toml
132
+ # pyproject.toml - NO CHANGES to dependencies
133
+ # StatisticalAnalyzer uses only:
134
+ # - pydantic-ai (already in main deps)
135
+ # - modal (already in main deps)
136
+ # - src.tools.code_execution (no agent_framework)
137
+ ```
138
+
139
+ ### 4.2 Environment Variables
140
+
141
+ ```bash
142
+ # .env
143
+ MODAL_TOKEN_ID=your-token-id
144
+ MODAL_TOKEN_SECRET=your-token-secret
145
+ ```
146
+
147
+ ### 4.3 Integration Points
148
+
149
+ | Integration Point | File | Change Required |
150
+ |-------------------|------|-----------------|
151
+ | New Service | `src/services/statistical_analyzer.py` | CREATE (no agent_framework) |
152
+ | Simple Orchestrator | `src/orchestrator.py` | Use `StatisticalAnalyzer` |
153
+ | Config | `src/utils/config.py` | Add `enable_modal_analysis` setting |
154
+ | AnalysisAgent | `src/agents/analysis_agent.py` | Refactor to wrap `StatisticalAnalyzer` |
155
+ | MCP Tool | `src/mcp_tools.py` | Add `analyze_hypothesis` tool |
156
+
157
+ ---
158
+
159
+ ## 5. Implementation
160
+
161
+ ### 5.1 Configuration Update (`src/utils/config.py`)
162
+
163
+ ```python
164
+ class Settings(BaseSettings):
165
+ # ... existing settings ...
166
+
167
+ # Modal Configuration
168
+ modal_token_id: str | None = None
169
+ modal_token_secret: str | None = None
170
+ enable_modal_analysis: bool = False # Opt-in for hackathon demo
171
+
172
+ @property
173
+ def modal_available(self) -> bool:
174
+ """Check if Modal credentials are configured."""
175
+ return bool(self.modal_token_id and self.modal_token_secret)
176
+ ```
177
+
178
+ ### 5.2 StatisticalAnalyzer Service (`src/services/statistical_analyzer.py`)
179
+
180
+ **This is the key fix - NO agent_framework imports.**
181
+
182
+ ```python
183
+ """Statistical analysis service using Modal code execution.
184
+
185
+ This module provides Modal-based statistical analysis WITHOUT depending on
186
+ agent_framework. This allows it to be used in the simple orchestrator mode
187
+ without requiring the magentic optional dependency.
188
+
189
+ The AnalysisAgent (in src/agents/) wraps this service for magentic mode.
190
+ """
191
+
192
+ import asyncio
193
+ import re
194
+ from functools import partial
195
+ from typing import Any
196
+
197
+ from pydantic import BaseModel, Field
198
+ from pydantic_ai import Agent
199
+
200
+ from src.agent_factory.judges import get_model
201
+ from src.tools.code_execution import (
202
+ CodeExecutionError,
203
+ get_code_executor,
204
+ get_sandbox_library_prompt,
205
+ )
206
+ from src.utils.models import Evidence
207
+
208
+
209
+ class AnalysisResult(BaseModel):
210
+ """Result of statistical analysis."""
211
+
212
+ verdict: str = Field(
213
+ description="SUPPORTED, REFUTED, or INCONCLUSIVE",
214
+ )
215
+ confidence: float = Field(ge=0.0, le=1.0, description="Confidence in verdict (0-1)")
216
+ statistical_evidence: str = Field(
217
+ description="Summary of statistical findings from code execution"
218
+ )
219
+ code_generated: str = Field(description="Python code that was executed")
220
+ execution_output: str = Field(description="Output from code execution")
221
+ key_findings: list[str] = Field(default_factory=list, description="Key takeaways")
222
+ limitations: list[str] = Field(default_factory=list, description="Limitations")
223
+
224
+
225
+ class StatisticalAnalyzer:
226
+ """Performs statistical analysis using Modal code execution.
227
+
228
+ This service:
229
+ 1. Generates Python code for statistical analysis using LLM
230
+ 2. Executes code in Modal sandbox
231
+ 3. Interprets results
232
+ 4. Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE)
233
+
234
+ Note: This class has NO agent_framework dependency, making it safe
235
+ to use in the simple orchestrator without the magentic extra.
236
+ """
237
+
238
+ def __init__(self) -> None:
239
+ """Initialize the analyzer."""
240
+ self._code_executor: Any = None
241
+ self._agent: Agent[None, str] | None = None
242
+
243
+ def _get_code_executor(self) -> Any:
244
+ """Lazy initialization of code executor."""
245
+ if self._code_executor is None:
246
+ self._code_executor = get_code_executor()
247
+ return self._code_executor
248
+
249
+ def _get_agent(self) -> Agent[None, str]:
250
+ """Lazy initialization of LLM agent for code generation."""
251
+ if self._agent is None:
252
+ library_versions = get_sandbox_library_prompt()
253
+ self._agent = Agent(
254
+ model=get_model(),
255
+ output_type=str,
256
+ system_prompt=f"""You are a biomedical data scientist.
257
+
258
+ Generate Python code to analyze research evidence and test hypotheses.
259
+
260
+ Guidelines:
261
+ 1. Use pandas, numpy, scipy.stats for analysis
262
+ 2. Print clear, interpretable results
263
+ 3. Include statistical tests (t-tests, chi-square, etc.)
264
+ 4. Calculate effect sizes and confidence intervals
265
+ 5. Keep code concise (<50 lines)
266
+ 6. Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE
267
+
268
+ Available libraries:
269
+ {library_versions}
270
+
271
+ Output format: Return ONLY executable Python code, no explanations.""",
272
+ )
273
+ return self._agent
274
+
275
+ async def analyze(
276
+ self,
277
+ query: str,
278
+ evidence: list[Evidence],
279
+ hypothesis: dict[str, Any] | None = None,
280
+ ) -> AnalysisResult:
281
+ """Run statistical analysis on evidence.
282
+
283
+ Args:
284
+ query: The research question
285
+ evidence: List of Evidence objects to analyze
286
+ hypothesis: Optional hypothesis dict with drug, target, pathway, effect
287
+
288
+ Returns:
289
+ AnalysisResult with verdict and statistics
290
+ """
291
+ # Build analysis prompt
292
+ evidence_summary = self._summarize_evidence(evidence[:10])
293
+ hypothesis_text = ""
294
+ if hypothesis:
295
+ hypothesis_text = f"""
296
+ Hypothesis: {hypothesis.get('drug', 'Unknown')} → {hypothesis.get('target', '?')} → {hypothesis.get('pathway', '?')} → {hypothesis.get('effect', '?')}
297
+ Confidence: {hypothesis.get('confidence', 0.5):.0%}
298
+ """
299
+
300
+ prompt = f"""Generate Python code to statistically analyze:
301
+
302
+ **Research Question**: {query}
303
+ {hypothesis_text}
304
+
305
+ **Evidence Summary**:
306
+ {evidence_summary}
307
+
308
+ Generate executable Python code to analyze this evidence."""
309
+
310
+ try:
311
+ # Generate code
312
+ agent = self._get_agent()
313
+ code_result = await agent.run(prompt)
314
+ generated_code = code_result.output
315
+
316
+ # Execute in Modal sandbox
317
+ loop = asyncio.get_running_loop()
318
+ executor = self._get_code_executor()
319
+ execution = await loop.run_in_executor(
320
+ None, partial(executor.execute, generated_code, timeout=120)
321
+ )
322
+
323
+ if not execution["success"]:
324
+ return AnalysisResult(
325
+ verdict="INCONCLUSIVE",
326
+ confidence=0.0,
327
+ statistical_evidence=f"Execution failed: {execution['error']}",
328
+ code_generated=generated_code,
329
+ execution_output=execution.get("stderr", ""),
330
+ key_findings=[],
331
+ limitations=["Code execution failed"],
332
+ )
333
+
334
+ # Interpret results
335
+ return self._interpret_results(generated_code, execution)
336
+
337
+ except CodeExecutionError as e:
338
+ return AnalysisResult(
339
+ verdict="INCONCLUSIVE",
340
+ confidence=0.0,
341
+ statistical_evidence=str(e),
342
+ code_generated="",
343
+ execution_output="",
344
+ key_findings=[],
345
+ limitations=[f"Analysis error: {e}"],
346
+ )
347
+
348
+ def _summarize_evidence(self, evidence: list[Evidence]) -> str:
349
+ """Summarize evidence for code generation prompt."""
350
+ if not evidence:
351
+ return "No evidence available."
352
+
353
+ lines = []
354
+ for i, ev in enumerate(evidence[:5], 1):
355
+ lines.append(f"{i}. {ev.content[:200]}...")
356
+ lines.append(f" Source: {ev.citation.title}")
357
+ lines.append(f" Relevance: {ev.relevance:.0%}\n")
358
+
359
+ return "\n".join(lines)
360
+
361
+ def _interpret_results(
362
+ self,
363
+ code: str,
364
+ execution: dict[str, Any],
365
+ ) -> AnalysisResult:
366
+ """Interpret code execution results."""
367
+ stdout = execution["stdout"]
368
+ stdout_upper = stdout.upper()
369
+
370
+ # Extract verdict with robust word-boundary matching
371
+ verdict = "INCONCLUSIVE"
372
+ if re.search(r"\bSUPPORTED\b", stdout_upper) and not re.search(
373
+ r"\b(?:NOT|UN)SUPPORTED\b", stdout_upper
374
+ ):
375
+ verdict = "SUPPORTED"
376
+ elif re.search(r"\bREFUTED\b", stdout_upper):
377
+ verdict = "REFUTED"
378
+
379
+ # Extract key findings
380
+ key_findings = []
381
+ for line in stdout.split("\n"):
382
+ line_lower = line.lower()
383
+ if any(kw in line_lower for kw in ["p-value", "significant", "effect", "mean"]):
384
+ key_findings.append(line.strip())
385
+
386
+ # Calculate confidence from p-values
387
+ confidence = self._calculate_confidence(stdout)
388
+
389
+ return AnalysisResult(
390
+ verdict=verdict,
391
+ confidence=confidence,
392
+ statistical_evidence=stdout.strip(),
393
+ code_generated=code,
394
+ execution_output=stdout,
395
+ key_findings=key_findings[:5],
396
+ limitations=[
397
+ "Analysis based on summary data only",
398
+ "Limited to available evidence",
399
+ "Statistical tests assume data independence",
400
+ ],
401
+ )
402
+
403
+ def _calculate_confidence(self, output: str) -> float:
404
+ """Calculate confidence based on statistical results."""
405
+ p_values = re.findall(r"p[-\s]?value[:\s]+(\d+\.?\d*)", output.lower())
406
+
407
+ if p_values:
408
+ try:
409
+ min_p = min(float(p) for p in p_values)
410
+ if min_p < 0.001:
411
+ return 0.95
412
+ elif min_p < 0.01:
413
+ return 0.90
414
+ elif min_p < 0.05:
415
+ return 0.80
416
+ else:
417
+ return 0.60
418
+ except ValueError:
419
+ pass
420
+
421
+ return 0.70 # Default
422
+
423
+
424
+ # Singleton for reuse
425
+ _analyzer: StatisticalAnalyzer | None = None
426
+
427
+
428
+ def get_statistical_analyzer() -> StatisticalAnalyzer:
429
+ """Get or create singleton StatisticalAnalyzer instance."""
430
+ global _analyzer
431
+ if _analyzer is None:
432
+ _analyzer = StatisticalAnalyzer()
433
+ return _analyzer
434
+ ```
435
+
436
+ ### 5.3 Simple Orchestrator Update (`src/orchestrator.py`)
437
+
438
+ **Uses `StatisticalAnalyzer` directly - NO agent_framework import.**
439
+
440
+ ```python
441
+ """Main orchestrator with optional Modal analysis."""
442
+
443
+ from src.utils.config import settings
444
+
445
+ # ... existing imports ...
446
+
447
+
448
+ class Orchestrator:
449
+ """Search-Judge-Analyze orchestration loop."""
450
+
451
+ def __init__(
452
+ self,
453
+ search_handler: SearchHandlerProtocol,
454
+ judge_handler: JudgeHandlerProtocol,
455
+ config: OrchestratorConfig | None = None,
456
+ enable_analysis: bool = False, # New parameter
457
+ ) -> None:
458
+ self.search = search_handler
459
+ self.judge = judge_handler
460
+ self.config = config or OrchestratorConfig()
461
+ self.history: list[dict[str, Any]] = []
462
+ self._enable_analysis = enable_analysis and settings.modal_available
463
+
464
+ # Lazy-load analysis (NO agent_framework dependency!)
465
+ self._analyzer: Any = None
466
+
467
+ def _get_analyzer(self) -> Any:
468
+ """Lazy initialization of StatisticalAnalyzer.
469
+
470
+ Note: This imports from src.services, NOT src.agents,
471
+ so it works without the magentic optional dependency.
472
+ """
473
+ if self._analyzer is None:
474
+ from src.services.statistical_analyzer import get_statistical_analyzer
475
+
476
+ self._analyzer = get_statistical_analyzer()
477
+ return self._analyzer
478
+
479
+ async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
480
+ """Main orchestration loop with optional Modal analysis."""
481
+ # ... existing search/judge loop ...
482
+
483
+ # After judge says "synthesize", optionally run analysis
484
+ if self._enable_analysis and assessment.recommendation == "synthesize":
485
+ yield AgentEvent(
486
+ type="analyzing",
487
+ message="Running statistical analysis in Modal sandbox...",
488
+ data={},
489
+ iteration=iteration,
490
+ )
491
+
492
+ try:
493
+ analyzer = self._get_analyzer()
494
+
495
+ # Run Modal analysis (no agent_framework needed!)
496
+ analysis_result = await analyzer.analyze(
497
+ query=query,
498
+ evidence=all_evidence,
499
+ hypothesis=None, # Could add hypothesis generation later
500
+ )
501
+
502
+ yield AgentEvent(
503
+ type="analysis_complete",
504
+ message=f"Analysis verdict: {analysis_result.verdict}",
505
+ data=analysis_result.model_dump(),
506
+ iteration=iteration,
507
+ )
508
+
509
+ except Exception as e:
510
+ yield AgentEvent(
511
+ type="error",
512
+ message=f"Modal analysis failed: {e}",
513
+ data={"error": str(e)},
514
+ iteration=iteration,
515
+ )
516
+
517
+ # Continue to synthesis...
518
+ ```
519
+
520
+ ### 5.4 Refactor AnalysisAgent (`src/agents/analysis_agent.py`)
521
+
522
+ **Wrap `StatisticalAnalyzer` for magentic mode.**
523
+
524
+ ```python
525
+ """Analysis agent for statistical analysis using Modal code execution.
526
+
527
+ This agent wraps StatisticalAnalyzer for use in magentic multi-agent mode.
528
+ The core logic is in src/services/statistical_analyzer.py to avoid
529
+ coupling agent_framework to the simple orchestrator.
530
+ """
531
+
532
+ from collections.abc import AsyncIterable
533
+ from typing import TYPE_CHECKING, Any
534
+
535
+ from agent_framework import (
536
+ AgentRunResponse,
537
+ AgentRunResponseUpdate,
538
+ AgentThread,
539
+ BaseAgent,
540
+ ChatMessage,
541
+ Role,
542
+ )
543
+
544
+ from src.services.statistical_analyzer import (
545
+ AnalysisResult,
546
+ get_statistical_analyzer,
547
+ )
548
+ from src.utils.models import Evidence
549
+
550
+ if TYPE_CHECKING:
551
+ from src.services.embeddings import EmbeddingService
552
+
553
+
554
+ class AnalysisAgent(BaseAgent): # type: ignore[misc]
555
+ """Wraps StatisticalAnalyzer for magentic multi-agent mode."""
556
+
557
+ def __init__(
558
+ self,
559
+ evidence_store: dict[str, Any],
560
+ embedding_service: "EmbeddingService | None" = None,
561
+ ) -> None:
562
+ super().__init__(
563
+ name="AnalysisAgent",
564
+ description="Performs statistical analysis using Modal sandbox",
565
+ )
566
+ self._evidence_store = evidence_store
567
+ self._embeddings = embedding_service
568
+ self._analyzer = get_statistical_analyzer()
569
+
570
+ async def run(
571
+ self,
572
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
573
+ *,
574
+ thread: AgentThread | None = None,
575
+ **kwargs: Any,
576
+ ) -> AgentRunResponse:
577
+ """Analyze evidence and return verdict."""
578
+ query = self._extract_query(messages)
579
+ hypotheses = self._evidence_store.get("hypotheses", [])
580
+ evidence = self._evidence_store.get("current", [])
581
+
582
+ if not evidence:
583
+ return self._error_response("No evidence available.")
584
+
585
+ # Get primary hypothesis if available
586
+ hypothesis_dict = None
587
+ if hypotheses:
588
+ h = hypotheses[0]
589
+ hypothesis_dict = {
590
+ "drug": getattr(h, "drug", "Unknown"),
591
+ "target": getattr(h, "target", "?"),
592
+ "pathway": getattr(h, "pathway", "?"),
593
+ "effect": getattr(h, "effect", "?"),
594
+ "confidence": getattr(h, "confidence", 0.5),
595
+ }
596
+
597
+ # Delegate to StatisticalAnalyzer
598
+ result = await self._analyzer.analyze(
599
+ query=query,
600
+ evidence=evidence,
601
+ hypothesis=hypothesis_dict,
602
+ )
603
+
604
+ # Store in shared context
605
+ self._evidence_store["analysis"] = result.model_dump()
606
+
607
+ # Format response
608
+ response_text = self._format_response(result)
609
+
610
+ return AgentRunResponse(
611
+ messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
612
+ response_id=f"analysis-{result.verdict.lower()}",
613
+ additional_properties={"analysis": result.model_dump()},
614
+ )
615
+
616
+ def _format_response(self, result: AnalysisResult) -> str:
617
+ """Format analysis result as markdown."""
618
+ lines = [
619
+ "## Statistical Analysis Complete\n",
620
+ f"### Verdict: **{result.verdict}**",
621
+ f"**Confidence**: {result.confidence:.0%}\n",
622
+ "### Key Findings",
623
+ ]
624
+ for finding in result.key_findings:
625
+ lines.append(f"- {finding}")
626
+
627
+ lines.extend([
628
+ "\n### Statistical Evidence",
629
+ "```",
630
+ result.statistical_evidence,
631
+ "```",
632
+ ])
633
+ return "\n".join(lines)
634
+
635
+ def _error_response(self, message: str) -> AgentRunResponse:
636
+ """Create error response."""
637
+ return AgentRunResponse(
638
+ messages=[ChatMessage(role=Role.ASSISTANT, text=f"**Error**: {message}")],
639
+ response_id="analysis-error",
640
+ )
641
+
642
+ def _extract_query(
643
+ self, messages: str | ChatMessage | list[str] | list[ChatMessage] | None
644
+ ) -> str:
645
+ """Extract query from messages."""
646
+ if isinstance(messages, str):
647
+ return messages
648
+ elif isinstance(messages, ChatMessage):
649
+ return messages.text or ""
650
+ elif isinstance(messages, list):
651
+ for msg in reversed(messages):
652
+ if isinstance(msg, ChatMessage) and msg.role == Role.USER:
653
+ return msg.text or ""
654
+ elif isinstance(msg, str):
655
+ return msg
656
+ return ""
657
+
658
+ async def run_stream(
659
+ self,
660
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
661
+ *,
662
+ thread: AgentThread | None = None,
663
+ **kwargs: Any,
664
+ ) -> AsyncIterable[AgentRunResponseUpdate]:
665
+ """Streaming wrapper."""
666
+ result = await self.run(messages, thread=thread, **kwargs)
667
+ yield AgentRunResponseUpdate(messages=result.messages, response_id=result.response_id)
668
+ ```
669
+
670
+ ### 5.5 MCP Tool for Modal Analysis (`src/mcp_tools.py`)
671
+
672
+ Add to existing MCP tools:
673
+
674
+ ```python
675
+ async def analyze_hypothesis(
676
+ drug: str,
677
+ condition: str,
678
+ evidence_summary: str,
679
+ ) -> str:
680
+ """Perform statistical analysis of drug repurposing hypothesis using Modal.
681
+
682
+ Executes AI-generated Python code in a secure Modal sandbox to analyze
683
+ the statistical evidence for a drug repurposing hypothesis.
684
+
685
+ Args:
686
+ drug: The drug being evaluated (e.g., "metformin")
687
+ condition: The target condition (e.g., "Alzheimer's disease")
688
+ evidence_summary: Summary of evidence to analyze
689
+
690
+ Returns:
691
+ Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics
692
+ """
693
+ from src.services.statistical_analyzer import get_statistical_analyzer
694
+ from src.utils.config import settings
695
+ from src.utils.models import Citation, Evidence
696
+
697
+ if not settings.modal_available:
698
+ return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET."
699
+
700
+ # Create evidence from summary
701
+ evidence = [
702
+ Evidence(
703
+ content=evidence_summary,
704
+ citation=Citation(
705
+ source="pubmed",
706
+ title=f"Evidence for {drug} in {condition}",
707
+ url="https://example.com",
708
+ date="2024-01-01",
709
+ authors=["User Provided"],
710
+ ),
711
+ relevance=0.9,
712
+ )
713
+ ]
714
+
715
+ analyzer = get_statistical_analyzer()
716
+ result = await analyzer.analyze(
717
+ query=f"Can {drug} treat {condition}?",
718
+ evidence=evidence,
719
+ hypothesis={"drug": drug, "target": "unknown", "pathway": "unknown", "effect": condition},
720
+ )
721
+
722
+ return f"""## Statistical Analysis: {drug} for {condition}
723
+
724
+ ### Verdict: **{result.verdict}**
725
+ **Confidence**: {result.confidence:.0%}
726
+
727
+ ### Key Findings
728
+ {chr(10).join(f"- {f}" for f in result.key_findings) or "- No specific findings extracted"}
729
+
730
+ ### Execution Output
731
+ ```
732
+ {result.execution_output}
733
+ ```
734
+
735
+ ### Generated Code
736
+ ```python
737
+ {result.code_generated}
738
+ ```
739
+
740
+ **Executed in Modal Sandbox** - Isolated, secure, reproducible.
741
+ """
742
+ ```
743
+
744
+ ### 5.6 Demo Scripts
745
+
746
+ #### `examples/modal_demo/verify_sandbox.py`
747
+
748
+ ```python
749
+ #!/usr/bin/env python3
750
+ """Verify that Modal sandbox is properly isolated.
751
+
752
+ This script proves to judges that code runs in Modal, not locally.
753
+ NO agent_framework dependency - uses only src.tools.code_execution.
754
+
755
+ Usage:
756
+ uv run python examples/modal_demo/verify_sandbox.py
757
+ """
758
+
759
+ import asyncio
760
+ from functools import partial
761
+
762
+ from src.tools.code_execution import get_code_executor
763
+ from src.utils.config import settings
764
+
765
+
766
+ async def main() -> None:
767
+ """Verify Modal sandbox isolation."""
768
+ if not settings.modal_available:
769
+ print("Error: Modal credentials not configured.")
770
+ print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env")
771
+ return
772
+
773
+ executor = get_code_executor()
774
+ loop = asyncio.get_running_loop()
775
+
776
+ print("=" * 60)
777
+ print("Modal Sandbox Isolation Verification")
778
+ print("=" * 60 + "\n")
779
+
780
+ # Test 1: Hostname
781
+ print("Test 1: Check hostname (should NOT be your machine)")
782
+ code1 = "import socket; print(f'Hostname: {socket.gethostname()}')"
783
+ result1 = await loop.run_in_executor(None, partial(executor.execute, code1))
784
+ print(f" {result1['stdout'].strip()}\n")
785
+
786
+ # Test 2: Scientific libraries
787
+ print("Test 2: Verify scientific libraries")
788
+ code2 = """
789
+ import pandas as pd
790
+ import numpy as np
791
+ import scipy
792
+ print(f"pandas: {pd.__version__}")
793
+ print(f"numpy: {np.__version__}")
794
+ print(f"scipy: {scipy.__version__}")
795
+ """
796
+ result2 = await loop.run_in_executor(None, partial(executor.execute, code2))
797
+ print(f" {result2['stdout'].strip()}\n")
798
+
799
+ # Test 3: Network blocked
800
+ print("Test 3: Verify network isolation")
801
+ code3 = """
802
+ import urllib.request
803
+ try:
804
+ urllib.request.urlopen("https://google.com", timeout=2)
805
+ print("Network: ALLOWED (unexpected!)")
806
+ except Exception:
807
+ print("Network: BLOCKED (as expected)")
808
+ """
809
+ result3 = await loop.run_in_executor(None, partial(executor.execute, code3))
810
+ print(f" {result3['stdout'].strip()}\n")
811
+
812
+ # Test 4: Real statistics
813
+ print("Test 4: Execute statistical analysis")
814
+ code4 = """
815
+ import pandas as pd
816
+ import scipy.stats as stats
817
+
818
+ data = pd.DataFrame({'effect': [0.42, 0.38, 0.51]})
819
+ mean = data['effect'].mean()
820
+ t_stat, p_val = stats.ttest_1samp(data['effect'], 0)
821
+
822
+ print(f"Mean Effect: {mean:.3f}")
823
+ print(f"P-value: {p_val:.4f}")
824
+ print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}")
825
+ """
826
+ result4 = await loop.run_in_executor(None, partial(executor.execute, code4))
827
+ print(f" {result4['stdout'].strip()}\n")
828
+
829
+ print("=" * 60)
830
+ print("All tests complete - Modal sandbox verified!")
831
+ print("=" * 60)
832
+
833
+
834
+ if __name__ == "__main__":
835
+ asyncio.run(main())
836
+ ```
837
+
838
+ #### `examples/modal_demo/run_analysis.py`
839
+
840
+ ```python
841
+ #!/usr/bin/env python3
842
+ """Demo: Modal-powered statistical analysis.
843
+
844
+ This script uses StatisticalAnalyzer directly (NO agent_framework dependency).
845
+
846
+ Usage:
847
+ uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
848
+ """
849
+
850
+ import argparse
851
+ import asyncio
852
+ import os
853
+ import sys
854
+
855
+ from src.services.statistical_analyzer import get_statistical_analyzer
856
+ from src.tools.pubmed import PubMedTool
857
+ from src.utils.config import settings
858
+
859
+
860
+ async def main() -> None:
861
+ """Run the Modal analysis demo."""
862
+ parser = argparse.ArgumentParser(description="Modal Analysis Demo")
863
+ parser.add_argument("query", help="Research query")
864
+ args = parser.parse_args()
865
+
866
+ if not settings.modal_available:
867
+ print("Error: Modal credentials not configured.")
868
+ sys.exit(1)
869
+
870
+ if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
871
+ print("Error: No LLM API key found.")
872
+ sys.exit(1)
873
+
874
+ print(f"\n{'=' * 60}")
875
+ print("DeepCritical Modal Analysis Demo")
876
+ print(f"Query: {args.query}")
877
+ print(f"{'=' * 60}\n")
878
+
879
+ # Step 1: Gather Evidence
880
+ print("Step 1: Gathering evidence from PubMed...")
881
+ pubmed = PubMedTool()
882
+ evidence = await pubmed.search(args.query, max_results=5)
883
+ print(f" Found {len(evidence)} papers\n")
884
+
885
+ # Step 2: Run Modal Analysis
886
+ print("Step 2: Running statistical analysis in Modal sandbox...")
887
+ analyzer = get_statistical_analyzer()
888
+ result = await analyzer.analyze(query=args.query, evidence=evidence)
889
+
890
+ # Step 3: Display Results
891
+ print("\n" + "=" * 60)
892
+ print("ANALYSIS RESULTS")
893
+ print("=" * 60)
894
+ print(f"\nVerdict: {result.verdict}")
895
+ print(f"Confidence: {result.confidence:.0%}")
896
+ print("\nKey Findings:")
897
+ for finding in result.key_findings:
898
+ print(f" - {finding}")
899
+
900
+ print("\n[Demo Complete - Code executed in Modal, not locally]")
901
+
902
+
903
+ if __name__ == "__main__":
904
+ asyncio.run(main())
905
+ ```
906
+
907
+ ---
908
+
909
+ ## 6. TDD Test Suite
910
+
911
+ ### 6.1 Unit Tests (`tests/unit/services/test_statistical_analyzer.py`)
912
+
913
+ ```python
914
+ """Unit tests for StatisticalAnalyzer service."""
915
+
916
+ from unittest.mock import AsyncMock, MagicMock, patch
917
+
918
+ import pytest
919
+
920
+ from src.services.statistical_analyzer import (
921
+ AnalysisResult,
922
+ StatisticalAnalyzer,
923
+ get_statistical_analyzer,
924
+ )
925
+ from src.utils.models import Citation, Evidence
926
+
927
+
928
+ @pytest.fixture
929
+ def sample_evidence() -> list[Evidence]:
930
+ """Sample evidence for testing."""
931
+ return [
932
+ Evidence(
933
+ content="Metformin shows effect size of 0.45.",
934
+ citation=Citation(
935
+ source="pubmed",
936
+ title="Metformin Study",
937
+ url="https://pubmed.ncbi.nlm.nih.gov/12345/",
938
+ date="2024-01-15",
939
+ authors=["Smith J"],
940
+ ),
941
+ relevance=0.9,
942
+ )
943
+ ]
944
+
945
+
946
+ class TestStatisticalAnalyzer:
947
+ """Tests for StatisticalAnalyzer (no agent_framework dependency)."""
948
+
949
+ def test_no_agent_framework_import(self) -> None:
950
+ """StatisticalAnalyzer must NOT import agent_framework."""
951
+ import src.services.statistical_analyzer as module
952
+
953
+ # Check module doesn't import agent_framework
954
+ source = open(module.__file__).read()
955
+ assert "agent_framework" not in source
956
+ assert "BaseAgent" not in source
957
+
958
+ @pytest.mark.asyncio
959
+ async def test_analyze_returns_result(
960
+ self, sample_evidence: list[Evidence]
961
+ ) -> None:
962
+ """analyze() should return AnalysisResult."""
963
+ analyzer = StatisticalAnalyzer()
964
+
965
+ with patch.object(analyzer, "_get_agent") as mock_agent, \
966
+ patch.object(analyzer, "_get_code_executor") as mock_executor:
967
+
968
+ # Mock LLM
969
+ mock_agent.return_value.run = AsyncMock(
970
+ return_value=MagicMock(output="print('SUPPORTED')")
971
+ )
972
+
973
+ # Mock Modal
974
+ mock_executor.return_value.execute.return_value = {
975
+ "stdout": "SUPPORTED\np-value: 0.01",
976
+ "stderr": "",
977
+ "success": True,
978
+ }
979
+
980
+ result = await analyzer.analyze("test query", sample_evidence)
981
+
982
+ assert isinstance(result, AnalysisResult)
983
+ assert result.verdict == "SUPPORTED"
984
+
985
+ def test_singleton(self) -> None:
986
+ """get_statistical_analyzer should return singleton."""
987
+ a1 = get_statistical_analyzer()
988
+ a2 = get_statistical_analyzer()
989
+ assert a1 is a2
990
+
991
+
992
+ class TestAnalysisResult:
993
+ """Tests for AnalysisResult model."""
994
+
995
+ def test_verdict_values(self) -> None:
996
+ """Verdict should be one of the expected values."""
997
+ for verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]:
998
+ result = AnalysisResult(
999
+ verdict=verdict,
1000
+ confidence=0.8,
1001
+ statistical_evidence="test",
1002
+ code_generated="print('test')",
1003
+ execution_output="test",
1004
+ )
1005
+ assert result.verdict == verdict
1006
+
1007
+ def test_confidence_bounds(self) -> None:
1008
+ """Confidence must be 0.0-1.0."""
1009
+ with pytest.raises(ValueError):
1010
+ AnalysisResult(
1011
+ verdict="SUPPORTED",
1012
+ confidence=1.5, # Invalid
1013
+ statistical_evidence="test",
1014
+ code_generated="test",
1015
+ execution_output="test",
1016
+ )
1017
+ ```
1018
+
1019
+ ### 6.2 Integration Test (`tests/integration/test_modal.py`)
1020
+
1021
+ ```python
1022
+ """Integration tests for Modal (requires credentials)."""
1023
+
1024
+ import pytest
1025
+
1026
+ from src.utils.config import settings
1027
+
1028
+
1029
+ @pytest.mark.integration
1030
+ @pytest.mark.skipif(not settings.modal_available, reason="Modal not configured")
1031
+ class TestModalIntegration:
1032
+ """Integration tests requiring Modal credentials."""
1033
+
1034
+ @pytest.mark.asyncio
1035
+ async def test_sandbox_executes_code(self) -> None:
1036
+ """Modal sandbox should execute Python code."""
1037
+ import asyncio
1038
+ from functools import partial
1039
+
1040
+ from src.tools.code_execution import get_code_executor
1041
+
1042
+ executor = get_code_executor()
1043
+ code = "import pandas as pd; print(pd.DataFrame({'a': [1,2,3]})['a'].sum())"
1044
+
1045
+ loop = asyncio.get_running_loop()
1046
+ result = await loop.run_in_executor(
1047
+ None, partial(executor.execute, code, timeout=30)
1048
+ )
1049
+
1050
+ assert result["success"]
1051
+ assert "6" in result["stdout"]
1052
+
1053
+ @pytest.mark.asyncio
1054
+ async def test_statistical_analyzer_works(self) -> None:
1055
+ """StatisticalAnalyzer should work end-to-end."""
1056
+ from src.services.statistical_analyzer import get_statistical_analyzer
1057
+ from src.utils.models import Citation, Evidence
1058
+
1059
+ evidence = [
1060
+ Evidence(
1061
+ content="Drug shows 40% improvement in trial.",
1062
+ citation=Citation(
1063
+ source="pubmed",
1064
+ title="Test",
1065
+ url="https://test.com",
1066
+ date="2024-01-01",
1067
+ authors=["Test"],
1068
+ ),
1069
+ relevance=0.9,
1070
+ )
1071
+ ]
1072
+
1073
+ analyzer = get_statistical_analyzer()
1074
+ result = await analyzer.analyze("test drug efficacy", evidence)
1075
+
1076
+ assert result.verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]
1077
+ assert 0.0 <= result.confidence <= 1.0
1078
+ ```
1079
+
1080
+ ---
1081
+
1082
+ ## 7. Verification Commands
1083
+
1084
+ ```bash
1085
+ # 1. Verify NO agent_framework in StatisticalAnalyzer
1086
+ grep -r "agent_framework" src/services/statistical_analyzer.py
1087
+ # Should return nothing!
1088
+
1089
+ # 2. Run unit tests (no Modal needed)
1090
+ uv run pytest tests/unit/services/test_statistical_analyzer.py -v
1091
+
1092
+ # 3. Run verification script (requires Modal)
1093
+ uv run python examples/modal_demo/verify_sandbox.py
1094
+
1095
+ # 4. Run analysis demo (requires Modal + LLM)
1096
+ uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
1097
+
1098
+ # 5. Run integration tests
1099
+ uv run pytest tests/integration/test_modal.py -v -m integration
1100
+
1101
+ # 6. Full test suite
1102
+ make check
1103
+ ```
1104
+
1105
+ ---
1106
+
1107
+ ## 8. Definition of Done
1108
+
1109
+ Phase 13 is **COMPLETE** when:
1110
+
1111
+ - [ ] `src/services/statistical_analyzer.py` created (NO agent_framework)
1112
+ - [ ] `src/utils/config.py` has `enable_modal_analysis` setting
1113
+ - [ ] `src/orchestrator.py` uses `StatisticalAnalyzer` directly
1114
+ - [ ] `src/agents/analysis_agent.py` refactored to wrap `StatisticalAnalyzer`
1115
+ - [ ] `src/mcp_tools.py` has `analyze_hypothesis` tool
1116
+ - [ ] `examples/modal_demo/verify_sandbox.py` working
1117
+ - [ ] `examples/modal_demo/run_analysis.py` working
1118
+ - [ ] Unit tests pass WITHOUT magentic extra installed
1119
+ - [ ] Integration tests pass WITH Modal credentials
1120
+ - [ ] All lints pass
1121
+
1122
+ ---
1123
+
1124
+ ## 9. Architecture After Phase 13
1125
+
1126
+ ```text
1127
+ ┌─────────────────────────────────────────────────────────────────┐
1128
+ │ MCP Clients │
1129
+ │ (Claude Desktop, Cursor, etc.) │
1130
+ └───────────────────────────┬─────────────────────────────────────┘
1131
+ │ MCP Protocol
1132
+
1133
+ ┌─────────────────────────────────────────────────────────────────┐
1134
+ │ Gradio App + MCP Server │
1135
+ │ ┌──────────────────────────────────────────────────────────┐ │
1136
+ │ │ MCP Tools: search_pubmed, search_trials, search_biorxiv │ │
1137
+ │ │ search_all, analyze_hypothesis │ │
1138
+ │ └──────────────────────────────────────────────────────────┘ │
1139
+ └───────────────────────────┬─────────────────────────────────────┘
1140
+
1141
+ ┌───────────────────┴───────────────────┐
1142
+ │ │
1143
+ ▼ ▼
1144
+ ┌───────────────────────┐ ┌───────────────────────────┐
1145
+ │ Simple Orchestrator │ │ Magentic Orchestrator │
1146
+ │ (no agent_framework) │ │ (with agent_framework) │
1147
+ │ │ │ │
1148
+ │ SearchHandler │ │ SearchAgent │
1149
+ │ JudgeHandler │ │ JudgeAgent │
1150
+ │ StatisticalAnalyzer ─┼────────────┼→ AnalysisAgent ───────────┤
1151
+ │ │ │ (wraps StatisticalAnalyzer)
1152
+ └───────────┬───────────┘ └───────────────────────────┘
1153
+
1154
+
1155
+ ┌──────────────────────────────────────────────────────────────────┐
1156
+ │ StatisticalAnalyzer │
1157
+ │ (src/services/statistical_analyzer.py) │
1158
+ │ NO agent_framework dependency │
1159
+ │ │
1160
+ │ 1. Generate code with pydantic-ai │
1161
+ │ 2. Execute in Modal sandbox │
1162
+ │ 3. Return AnalysisResult │
1163
+ └───────────────────────────┬──────────────────────────────────────┘
1164
+
1165
+
1166
+ ┌─────────────────────────────────────────────────────────────────┐
1167
+ │ Modal Sandbox │
1168
+ │ ┌─────────────────────────────────────────────────────────┐ │
1169
+ │ │ - pandas, numpy, scipy, sklearn, statsmodels │ │
1170
+ │ │ - Network: BLOCKED │ │
1171
+ │ │ - Filesystem: ISOLATED │ │
1172
+ │ │ - Timeout: ENFORCED │ │
1173
+ │ └─────────────────────────────────────────────────────────┘ │
1174
+ └───────────────────────────────────────────────────────────���─────┘
1175
+ ```
1176
+
1177
+ **This is the dependency-safe Modal stack.**
1178
+
1179
+ ---
1180
+
1181
+ ## 10. Files Summary
1182
+
1183
+ | File | Action | Purpose |
1184
+ |------|--------|---------|
1185
+ | `src/services/statistical_analyzer.py` | **CREATE** | Core analysis (no agent_framework) |
1186
+ | `src/utils/config.py` | MODIFY | Add `enable_modal_analysis` |
1187
+ | `src/orchestrator.py` | MODIFY | Use `StatisticalAnalyzer` |
1188
+ | `src/agents/analysis_agent.py` | MODIFY | Wrap `StatisticalAnalyzer` |
1189
+ | `src/mcp_tools.py` | MODIFY | Add `analyze_hypothesis` |
1190
+ | `examples/modal_demo/verify_sandbox.py` | CREATE | Sandbox verification |
1191
+ | `examples/modal_demo/run_analysis.py` | CREATE | Demo script |
1192
+ | `tests/unit/services/test_statistical_analyzer.py` | CREATE | Unit tests |
1193
+ | `tests/integration/test_modal.py` | CREATE | Integration tests |
1194
+
1195
+ **Key Fix**: `StatisticalAnalyzer` has ZERO agent_framework imports, making it safe for the simple orchestrator.
docs/implementation/14_phase_demo_submission.md ADDED
@@ -0,0 +1,464 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 14 Implementation Spec: Demo Video & Hackathon Submission
2
+
3
+ **Goal**: Create compelling demo video and complete hackathon submission.
4
+ **Philosophy**: "Ship it with style."
5
+ **Prerequisite**: Phases 12-13 complete (MCP + Modal working)
6
+ **Priority**: P0 - REQUIRED FOR SUBMISSION
7
+ **Deadline**: November 30, 2025 11:59 PM UTC
8
+ **Estimated Time**: 2-3 hours
9
+
10
+ ---
11
+
12
+ ## 1. Submission Requirements
13
+
14
+ ### MCP's 1st Birthday Hackathon Checklist
15
+
16
+ | Requirement | Status | Action |
17
+ |-------------|--------|--------|
18
+ | HuggingFace Space in `MCP-1st-Birthday` org | Pending | Transfer or create |
19
+ | Track tag in README.md | Pending | Add tag |
20
+ | Social media post link | Pending | Create post |
21
+ | Demo video (1-5 min) | Pending | Record |
22
+ | Team members registered | Pending | Verify |
23
+ | Original work (Nov 14-30) | **DONE** | All commits in range |
24
+
25
+ ### Track 2: MCP in Action - Tags
26
+
27
+ ```yaml
28
+ # Add to HuggingFace Space README.md
29
+ tags:
30
+ - mcp-in-action-track-enterprise # Healthcare/enterprise focus
31
+ ```
32
+
33
+ ---
34
+
35
+ ## 2. Prize Eligibility Summary
36
+
37
+ ### After Phases 12-13
38
+
39
+ | Award | Amount | Eligible | Requirements Met |
40
+ |-------|--------|----------|------------------|
41
+ | Track 2: MCP in Action (1st) | $2,500 | **YES** | MCP server working |
42
+ | Modal Innovation | $2,500 | **YES** | Sandbox demo ready |
43
+ | LlamaIndex | $1,000 | **YES** | Using RAG |
44
+ | Community Choice | $1,000 | Possible | Need great demo |
45
+ | **Total Potential** | **$7,000** | | |
46
+
47
+ ---
48
+
49
+ ## 3. Demo Video Specification
50
+
51
+ ### 3.1 Duration & Format
52
+
53
+ - **Length**: 3-4 minutes (sweet spot)
54
+ - **Format**: Screen recording + voice-over
55
+ - **Resolution**: 1080p minimum
56
+ - **Audio**: Clear narration, no background music
57
+
58
+ ### 3.2 Recommended Tools
59
+
60
+ | Tool | Purpose | Notes |
61
+ |------|---------|-------|
62
+ | OBS Studio | Screen recording | Free, cross-platform |
63
+ | Loom | Quick recording | Good for demos |
64
+ | QuickTime | Mac screen recording | Built-in |
65
+ | DaVinci Resolve | Editing | Free, professional |
66
+
67
+ ### 3.3 Demo Script (4 minutes)
68
+
69
+ ```markdown
70
+ ## Section 1: Hook (30 seconds)
71
+
72
+ [Show Gradio UI]
73
+
74
+ "DeepCritical is an AI-powered drug repurposing research agent.
75
+ It searches peer-reviewed literature, clinical trials, and cutting-edge preprints
76
+ to find new uses for existing drugs."
77
+
78
+ "Let me show you how it works."
79
+
80
+ ---
81
+
82
+ ## Section 2: Core Functionality (60 seconds)
83
+
84
+ [Type query: "Can metformin treat Alzheimer's disease?"]
85
+
86
+ "When I ask about metformin for Alzheimer's, DeepCritical:
87
+ 1. Searches PubMed for peer-reviewed papers
88
+ 2. Queries ClinicalTrials.gov for active trials
89
+ 3. Scans bioRxiv for the latest preprints"
90
+
91
+ [Show search results streaming]
92
+
93
+ "It then uses an LLM to assess the evidence quality and
94
+ synthesize findings into a structured research report."
95
+
96
+ [Show final report]
97
+
98
+ ---
99
+
100
+ ## Section 3: MCP Integration (60 seconds)
101
+
102
+ [Switch to Claude Desktop]
103
+
104
+ "What makes DeepCritical unique is full MCP integration.
105
+ These same tools are available to any MCP client."
106
+
107
+ [Show Claude Desktop with DeepCritical tools]
108
+
109
+ "I can ask Claude: 'Search PubMed for aspirin cancer prevention'"
110
+
111
+ [Show results appearing in Claude Desktop]
112
+
113
+ "The agent uses our MCP server to search real biomedical databases."
114
+
115
+ [Show MCP Inspector briefly]
116
+
117
+ "Here's the MCP schema - four tools exposed for any AI to use."
118
+
119
+ ---
120
+
121
+ ## Section 4: Modal Innovation (45 seconds)
122
+
123
+ [Run verify_sandbox.py]
124
+
125
+ "For statistical analysis, we use Modal for secure code execution."
126
+
127
+ [Show sandbox verification output]
128
+
129
+ "Notice the hostname is NOT my machine - code runs in an isolated container.
130
+ Network is blocked. The AI can't reach the internet from the sandbox."
131
+
132
+ [Run analysis demo]
133
+
134
+ "Modal executes LLM-generated statistical code safely,
135
+ returning verdicts like SUPPORTED, REFUTED, or INCONCLUSIVE."
136
+
137
+ ---
138
+
139
+ ## Section 5: Close (45 seconds)
140
+
141
+ [Return to Gradio UI]
142
+
143
+ "DeepCritical brings together:
144
+ - Three biomedical data sources
145
+ - MCP protocol for universal tool access
146
+ - Modal sandboxes for safe code execution
147
+ - LlamaIndex for semantic search
148
+
149
+ All in a beautiful Gradio interface."
150
+
151
+ "Check out the code on GitHub, try it on HuggingFace Spaces,
152
+ and let us know what you think."
153
+
154
+ "Thanks for watching!"
155
+
156
+ [Show links: GitHub, HuggingFace, Team names]
157
+ ```
158
+
159
+ ---
160
+
161
+ ## 4. HuggingFace Space Configuration
162
+
163
+ ### 4.1 Space README.md
164
+
165
+ ```markdown
166
+ ---
167
+ title: DeepCritical
168
+ emoji: 🧬
169
+ colorFrom: blue
170
+ colorTo: purple
171
+ sdk: gradio
172
+ sdk_version: "5.0.0"
173
+ app_file: src/app.py
174
+ pinned: false
175
+ license: mit
176
+ tags:
177
+ - mcp-in-action-track-enterprise
178
+ - mcp-hackathon
179
+ - drug-repurposing
180
+ - biomedical-ai
181
+ - pydantic-ai
182
+ - llamaindex
183
+ - modal
184
+ ---
185
+
186
+ # DeepCritical
187
+
188
+ AI-Powered Drug Repurposing Research Agent
189
+
190
+ ## Features
191
+
192
+ - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
193
+ - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
194
+ - **Modal Sandbox**: Secure execution of AI-generated statistical code
195
+ - **LlamaIndex RAG**: Semantic search and evidence synthesis
196
+
197
+ ## MCP Tools
198
+
199
+ Connect to our MCP server at:
200
+ ```
201
+ https://MCP-1st-Birthday-deepcritical.hf.space/gradio_api/mcp/
202
+ ```
203
+
204
+ Available tools:
205
+ - `search_pubmed` - Search peer-reviewed biomedical literature
206
+ - `search_clinical_trials` - Search ClinicalTrials.gov
207
+ - `search_biorxiv` - Search bioRxiv/medRxiv preprints
208
+ - `search_all` - Search all sources simultaneously
209
+
210
+ ## Team
211
+
212
+ - The-Obstacle-Is-The-Way
213
+ - MarioAderman
214
+
215
+ ## Links
216
+
217
+ - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
218
+ - [Demo Video](link-to-video)
219
+ ```
220
+
221
+ ### 4.2 Environment Variables (Secrets)
222
+
223
+ Set in HuggingFace Space settings:
224
+
225
+ ```
226
+ OPENAI_API_KEY=sk-...
227
+ ANTHROPIC_API_KEY=sk-ant-...
228
+ NCBI_API_KEY=...
229
+ MODAL_TOKEN_ID=...
230
+ MODAL_TOKEN_SECRET=...
231
+ ```
232
+
233
+ ---
234
+
235
+ ## 5. Social Media Post
236
+
237
+ ### Twitter/X Template
238
+
239
+ ```
240
+ 🧬 Excited to submit DeepCritical to MCP's 1st Birthday Hackathon!
241
+
242
+ An AI agent that:
243
+ ✅ Searches PubMed, ClinicalTrials.gov & bioRxiv
244
+ ✅ Exposes tools via MCP protocol
245
+ ✅ Runs statistical code in Modal sandboxes
246
+ ✅ Uses LlamaIndex for semantic search
247
+
248
+ Try it: [HuggingFace link]
249
+ Demo: [Video link]
250
+
251
+ #MCPHackathon #AIAgents #DrugRepurposing @huggingface @AnthropicAI
252
+ ```
253
+
254
+ ### LinkedIn Template
255
+
256
+ ```
257
+ Thrilled to share DeepCritical - our submission to MCP's 1st Birthday Hackathon!
258
+
259
+ 🔬 What it does:
260
+ DeepCritical is an AI-powered drug repurposing research agent that searches
261
+ peer-reviewed literature, clinical trials, and preprints to find new uses
262
+ for existing drugs.
263
+
264
+ 🛠️ Technical highlights:
265
+ • Full MCP integration - tools work with Claude Desktop
266
+ • Modal sandboxes for secure AI-generated code execution
267
+ • LlamaIndex RAG for semantic evidence search
268
+ • Three biomedical data sources in parallel
269
+
270
+ Built with PydanticAI, Gradio, and deployed on HuggingFace Spaces.
271
+
272
+ Try it: [link]
273
+ Watch the demo: [link]
274
+
275
+ #ArtificialIntelligence #Healthcare #DrugDiscovery #MCP #Hackathon
276
+ ```
277
+
278
+ ---
279
+
280
+ ## 6. Pre-Submission Checklist
281
+
282
+ ### 6.1 Code Quality
283
+
284
+ ```bash
285
+ # Run all checks
286
+ make check
287
+
288
+ # Expected output:
289
+ # ✅ Linting passed (ruff)
290
+ # ✅ Type checking passed (mypy)
291
+ # ✅ All 80+ tests passed (pytest)
292
+ ```
293
+
294
+ ### 6.2 Documentation
295
+
296
+ - [ ] README.md updated with MCP instructions
297
+ - [ ] All demo scripts have docstrings
298
+ - [ ] Example files work end-to-end
299
+ - [ ] CLAUDE.md is current
300
+
301
+ ### 6.3 Deployment Verification
302
+
303
+ ```bash
304
+ # Test locally
305
+ uv run python src/app.py
306
+ # Visit http://localhost:7860
307
+
308
+ # Test MCP schema
309
+ curl http://localhost:7860/gradio_api/mcp/schema
310
+
311
+ # Test Modal (if configured)
312
+ uv run python examples/modal_demo/verify_sandbox.py
313
+ ```
314
+
315
+ ### 6.4 HuggingFace Space
316
+
317
+ - [ ] Space created in `MCP-1st-Birthday` organization
318
+ - [ ] Secrets configured (API keys)
319
+ - [ ] App starts without errors
320
+ - [ ] MCP endpoint accessible
321
+ - [ ] Track tag in README
322
+
323
+ ---
324
+
325
+ ## 7. Recording Checklist
326
+
327
+ ### Before Recording
328
+
329
+ - [ ] Close unnecessary apps/notifications
330
+ - [ ] Clear browser history/tabs
331
+ - [ ] Test all demos work
332
+ - [ ] Prepare terminal windows
333
+ - [ ] Write down talking points
334
+
335
+ ### During Recording
336
+
337
+ - [ ] Speak clearly and at moderate pace
338
+ - [ ] Pause briefly between sections
339
+ - [ ] Show your face? (optional, adds personality)
340
+ - [ ] Don't rush - 3-4 min is enough time
341
+
342
+ ### After Recording
343
+
344
+ - [ ] Watch playback for errors
345
+ - [ ] Trim dead air at start/end
346
+ - [ ] Add title/end cards
347
+ - [ ] Export at 1080p
348
+ - [ ] Upload to YouTube/Loom
349
+
350
+ ---
351
+
352
+ ## 8. Submission Steps
353
+
354
+ ### Step 1: Finalize Code
355
+
356
+ ```bash
357
+ # Ensure clean state
358
+ git status
359
+ make check
360
+
361
+ # Push to GitHub
362
+ git push origin main
363
+
364
+ # Sync to HuggingFace
365
+ git push huggingface-upstream main
366
+ ```
367
+
368
+ ### Step 2: Verify HuggingFace Space
369
+
370
+ 1. Visit Space URL
371
+ 2. Test the chat interface
372
+ 3. Test MCP endpoint: `/gradio_api/mcp/schema`
373
+ 4. Verify README has track tag
374
+
375
+ ### Step 3: Record Demo Video
376
+
377
+ 1. Follow script from Section 3.3
378
+ 2. Edit and export
379
+ 3. Upload to YouTube (unlisted) or Loom
380
+ 4. Copy shareable link
381
+
382
+ ### Step 4: Create Social Post
383
+
384
+ 1. Write post (see templates)
385
+ 2. Include video link
386
+ 3. Tag relevant accounts
387
+ 4. Post and copy link
388
+
389
+ ### Step 5: Submit
390
+
391
+ 1. Ensure Space is in `MCP-1st-Birthday` org
392
+ 2. Verify track tag in README
393
+ 3. Submit entry (check hackathon page for form)
394
+ 4. Include all links
395
+
396
+ ---
397
+
398
+ ## 9. Verification Commands
399
+
400
+ ```bash
401
+ # 1. Full test suite
402
+ make check
403
+
404
+ # 2. Start local server
405
+ uv run python src/app.py
406
+
407
+ # 3. Verify MCP works
408
+ curl http://localhost:7860/gradio_api/mcp/schema | jq
409
+
410
+ # 4. Test with MCP Inspector
411
+ npx @anthropic/mcp-inspector http://localhost:7860/gradio_api/mcp/
412
+
413
+ # 5. Run Modal verification
414
+ uv run python examples/modal_demo/verify_sandbox.py
415
+
416
+ # 6. Run full demo
417
+ uv run python examples/orchestrator_demo/run_agent.py "metformin alzheimer"
418
+ ```
419
+
420
+ ---
421
+
422
+ ## 10. Definition of Done
423
+
424
+ Phase 14 is **COMPLETE** when:
425
+
426
+ - [ ] Demo video recorded (3-4 min)
427
+ - [ ] Video uploaded (YouTube/Loom)
428
+ - [ ] Social media post created with link
429
+ - [ ] HuggingFace Space in `MCP-1st-Birthday` org
430
+ - [ ] Track tag in Space README
431
+ - [ ] All team members registered
432
+ - [ ] Entry submitted before deadline
433
+ - [ ] Confirmation received
434
+
435
+ ---
436
+
437
+ ## 11. Timeline
438
+
439
+ | Task | Time | Deadline |
440
+ |------|------|----------|
441
+ | Phase 12: MCP Server | 2-3 hours | Nov 28 |
442
+ | Phase 13: Modal Integration | 2-3 hours | Nov 29 |
443
+ | Phase 14: Demo & Submit | 2-3 hours | Nov 30 |
444
+ | **Buffer** | ~24 hours | Before 11:59 PM UTC |
445
+
446
+ ---
447
+
448
+ ## 12. Contact & Support
449
+
450
+ ### Hackathon Resources
451
+
452
+ - Discord: `#agents-mcp-hackathon-winter25`
453
+ - HuggingFace: [MCP-1st-Birthday org](https://huggingface.co/MCP-1st-Birthday)
454
+ - MCP Docs: [modelcontextprotocol.io](https://modelcontextprotocol.io/)
455
+
456
+ ### Team Communication
457
+
458
+ - Coordinate on final review
459
+ - Agree on who submits
460
+ - Celebrate when done! 🎉
461
+
462
+ ---
463
+
464
+ **Good luck! Ship it with confidence.**
docs/implementation/roadmap.md CHANGED
@@ -41,7 +41,9 @@ src/
41
  ├── tools/ # Search tools
42
  │ ├── __init__.py
43
  │ ├── pubmed.py # PubMed E-utilities tool
44
- │ ├── websearch.py # DuckDuckGo search tool
 
 
45
  │ └── search_handler.py # Orchestrates multiple tools
46
  ├── prompts/ # Prompt templates
47
  │ ├── __init__.py
@@ -61,7 +63,8 @@ tests/
61
  ├── unit/
62
  │ ├── tools/
63
  │ │ ├── test_pubmed.py
64
- │ │ ├── test_websearch.py
 
65
  │ │ └── test_search_handler.py
66
  │ ├── agent_factory/
67
  │ │ └── test_judges.py
@@ -183,6 +186,8 @@ Structured Research Report
183
 
184
  ## Spec Documents
185
 
 
 
186
  1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** ✅
187
  2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** ✅
188
  3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** ✅
@@ -191,9 +196,18 @@ Structured Research Report
191
  6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** ✅
192
  7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** ✅
193
  8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** ✅
194
- 9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** 📝
195
- 10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** 📝
196
- 11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** 📝
 
 
 
 
 
 
 
 
 
197
 
198
  ---
199
 
@@ -209,8 +223,25 @@ Structured Research Report
209
  | Phase 6: Embeddings | ✅ COMPLETE | Semantic search + ChromaDB |
210
  | Phase 7: Hypothesis | ✅ COMPLETE | Mechanistic reasoning chains |
211
  | Phase 8: Report | ✅ COMPLETE | Structured scientific reports |
212
- | Phase 9: Source Cleanup | 📝 SPEC READY | Remove DuckDuckGo |
213
- | Phase 10: ClinicalTrials | 📝 SPEC READY | ClinicalTrials.gov API |
214
- | Phase 11: bioRxiv | 📝 SPEC READY | Preprint search |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
 
216
- *Phases 1-8 COMPLETE. Phases 9-11 will add multi-source credibility.*
 
41
  ├── tools/ # Search tools
42
  │ ├── __init__.py
43
  │ ├── pubmed.py # PubMed E-utilities tool
44
+ │ ├── clinicaltrials.py # ClinicalTrials.gov API
45
+ │ ├── biorxiv.py # bioRxiv/medRxiv preprints
46
+ │ ├── code_execution.py # Modal sandbox execution
47
  │ └── search_handler.py # Orchestrates multiple tools
48
  ├── prompts/ # Prompt templates
49
  │ ├── __init__.py
 
63
  ├── unit/
64
  │ ├── tools/
65
  │ │ ├── test_pubmed.py
66
+ │ │ ├── test_clinicaltrials.py
67
+ │ │ ├── test_biorxiv.py
68
  │ │ └── test_search_handler.py
69
  │ ├── agent_factory/
70
  │ │ └── test_judges.py
 
186
 
187
  ## Spec Documents
188
 
189
+ ### Core Platform (Phases 1-8)
190
+
191
  1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** ✅
192
  2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** ✅
193
  3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** ✅
 
196
  6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** ✅
197
  7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** ✅
198
  8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** ✅
199
+
200
+ ### Multi-Source Search (Phases 9-11)
201
+
202
+ 9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** ✅
203
+ 10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** ✅
204
+ 11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** ✅
205
+
206
+ ### Hackathon Integration (Phases 12-14)
207
+
208
+ 12. **[Phase 12 Spec: MCP Server](12_phase_mcp_server.md)** ✅ COMPLETE
209
+ 13. **[Phase 13 Spec: Modal Pipeline](13_phase_modal_integration.md)** 📝 P1 - $2,500
210
+ 14. **[Phase 14 Spec: Demo & Submission](14_phase_demo_submission.md)** 📝 P0 - REQUIRED
211
 
212
  ---
213
 
 
223
  | Phase 6: Embeddings | ✅ COMPLETE | Semantic search + ChromaDB |
224
  | Phase 7: Hypothesis | ✅ COMPLETE | Mechanistic reasoning chains |
225
  | Phase 8: Report | ✅ COMPLETE | Structured scientific reports |
226
+ | Phase 9: Source Cleanup | COMPLETE | Remove DuckDuckGo |
227
+ | Phase 10: ClinicalTrials | COMPLETE | ClinicalTrials.gov API |
228
+ | Phase 11: bioRxiv | COMPLETE | Preprint search |
229
+ | Phase 12: MCP Server | ✅ COMPLETE | MCP protocol integration |
230
+ | Phase 13: Modal Pipeline | 📝 SPEC READY | Sandboxed code execution |
231
+ | Phase 14: Demo & Submit | 📝 SPEC READY | Hackathon submission |
232
+
233
+ *Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.*
234
+
235
+ ---
236
+
237
+ ## Hackathon Prize Potential
238
+
239
+ | Award | Amount | Requirement | Phase |
240
+ |-------|--------|-------------|-------|
241
+ | Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 |
242
+ | Modal Innovation | $2,500 | Sandbox demo ready | 13 |
243
+ | LlamaIndex | $1,000 | Using RAG | ✅ Done |
244
+ | Community Choice | $1,000 | Great demo video | 14 |
245
+ | **Total Potential** | **$7,000** | | |
246
 
247
+ **Deadline: November 30, 2025 11:59 PM UTC**
docs/pending/00_priority_summary.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Hackathon Priority Summary
2
+
3
+ ## 4 Days Left (Deadline: Nov 30, 2025 11:59 PM UTC)
4
+
5
+ ---
6
+
7
+ ## Git Contribution Analysis
8
+
9
+ ```text
10
+ The-Obstacle-Is-The-Way: 20+ commits (Phases 1-11, all demos, all fixes)
11
+ MarioAderman: 3 commits (Modal, LlamaIndex, PubMed fix)
12
+ JJ (Maintainer): 0 code commits (merge button only)
13
+ ```
14
+
15
+ **Conclusion:** You built 90%+ of this codebase.
16
+
17
+ ---
18
+
19
+ ## Current Stack (What We Have)
20
+
21
+ | Component | Status | Files |
22
+ |-----------|--------|-------|
23
+ | PubMed Search | ✅ Working | `src/tools/pubmed.py` |
24
+ | ClinicalTrials Search | ✅ Working | `src/tools/clinicaltrials.py` |
25
+ | bioRxiv Search | ✅ Working | `src/tools/biorxiv.py` |
26
+ | Search Handler | ✅ Working | `src/tools/search_handler.py` |
27
+ | Embeddings/ChromaDB | ✅ Working | `src/services/embeddings.py` |
28
+ | LlamaIndex RAG | ✅ Working | `src/services/llamaindex_rag.py` |
29
+ | Hypothesis Agent | ✅ Working | `src/agents/hypothesis_agent.py` |
30
+ | Report Agent | ✅ Working | `src/agents/report_agent.py` |
31
+ | Judge Agent | ✅ Working | `src/agents/judge_agent.py` |
32
+ | Orchestrator | ✅ Working | `src/orchestrator.py` |
33
+ | Gradio UI | ✅ Working | `src/app.py` |
34
+ | Modal Code Execution | ⚠️ Built, not wired | `src/tools/code_execution.py` |
35
+ | **MCP Server** | ✅ **Working** | `src/mcp_tools.py`, `src/app.py` |
36
+
37
+ ---
38
+
39
+ ## What's Required for Track 2 (MCP in Action)
40
+
41
+ | Requirement | Have It? | Priority |
42
+ |-------------|----------|----------|
43
+ | Autonomous agent behavior | ✅ Yes | - |
44
+ | Must use MCP servers as tools | ✅ **YES** | Done (Phase 12) |
45
+ | Must be Gradio app | ✅ Yes | - |
46
+ | Planning/reasoning/execution | ✅ Yes | - |
47
+
48
+ **Bottom Line:** ✅ MCP server implemented in Phase 12. Track 2 compliant.
49
+
50
+ ---
51
+
52
+ ## 3 Things To Do (In Order)
53
+
54
+ ### 1. MCP Server (P0 - Required) ✅ DONE
55
+
56
+ - **Files:** `src/mcp_tools.py`, `src/app.py`
57
+ - **Status:** Implemented in Phase 12
58
+ - **Doc:** `02_mcp_server_integration.md`
59
+ - **Endpoint:** `/gradio_api/mcp/`
60
+
61
+ ### 2. Modal Wiring (P1 - $2,500 Prize)
62
+ - **File:** Update `src/agents/analysis_agent.py`
63
+ - **Time:** 2-3 hours
64
+ - **Doc:** `03_modal_integration.md`
65
+ - **Why:** Modal Innovation Award is $2,500
66
+
67
+ ### 3. Demo Video + Submission (P0 - Required)
68
+ - **Time:** 1-2 hours
69
+ - **Why:** Required for all submissions
70
+
71
+ ---
72
+
73
+ ## Submission Checklist
74
+
75
+ - [ ] Space in MCP-1st-Birthday org
76
+ - [ ] Tag: `mcp-in-action-track-enterprise`
77
+ - [ ] Social media post link
78
+ - [ ] Demo video (1-5 min)
79
+ - [ ] MCP server working
80
+ - [ ] All tests passing
81
+
82
+ ---
83
+
84
+ ## Prize Math
85
+
86
+ | Award | Amount | Eligible? |
87
+ |-------|--------|-----------|
88
+ | Track 2 1st Place | $2,500 | If MCP works |
89
+ | Modal Innovation | $2,500 | If Modal wired |
90
+ | LlamaIndex | $1,000 | Yes (have it) |
91
+ | Community Choice | $1,000 | Maybe |
92
+ | **Total Potential** | **$7,000** | With MCP + Modal |
93
+
94
+ ---
95
+
96
+ ## Next Actions
97
+
98
+ ```bash
99
+ # 1. MCP Server - DONE ✅
100
+ uv run python src/app.py # Starts Gradio with MCP at /gradio_api/mcp/
101
+
102
+ # 2. Test MCP works
103
+ curl http://localhost:7860/gradio_api/mcp/schema | jq
104
+
105
+ # 3. Wire Modal into pipeline
106
+ # (see 03_modal_integration.md)
107
+
108
+ # 4. Record demo video
109
+
110
+ # 5. Submit to MCP-1st-Birthday org
111
+ ```
docs/pending/01_hackathon_requirements.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP's 1st Birthday Hackathon - Requirements Analysis
2
+
3
+ > **✅ MCP Server implemented in Phase 12** - Track 2 compliant
4
+
5
+ ## Deadline: November 30, 2025 11:59 PM UTC
6
+
7
+ ---
8
+
9
+ ## Track Selection: MCP in Action (Track 2)
10
+
11
+ DeepCritical fits **Track 2: MCP in Action** - AI agent applications.
12
+
13
+ ### Required Tags (pick one)
14
+ ```yaml
15
+ tags:
16
+ - mcp-in-action-track-enterprise # Drug repurposing = enterprise/healthcare
17
+ # OR
18
+ - mcp-in-action-track-consumer # If targeting patients/consumers
19
+ ```
20
+
21
+ ### Track 2 Requirements
22
+
23
+ | Requirement | DeepCritical Status | Action Needed |
24
+ |-------------|---------------------|---------------|
25
+ | Autonomous Agent behavior | ✅ Have it | Search-Judge-Synthesize loop |
26
+ | Must use MCP servers as tools | ✅ **DONE** | `src/mcp_tools.py` |
27
+ | Must be a Gradio app | ✅ Have it | `src/app.py` |
28
+ | Planning, reasoning, execution | ✅ Have it | Orchestrator + Judge |
29
+ | Context Engineering / RAG | ✅ Have it | LlamaIndex + ChromaDB |
30
+
31
+ ---
32
+
33
+ ## Prize Opportunities
34
+
35
+ ### Current Eligibility vs With MCP Integration
36
+
37
+ | Award | Prize | Current | With MCP |
38
+ |-------|-------|---------|----------|
39
+ | MCP in Action (1st) | $2,500 | ✅ Eligible | ✅ STRONGER |
40
+ | Modal Innovation | $2,500 | ❌ Not using | ✅ ELIGIBLE (code execution) |
41
+ | Blaxel Choice | $2,500 | ❌ Not using | ⚠️ Could integrate |
42
+ | LlamaIndex | $1,000 | ✅ Using (Mario's code) | ✅ ELIGIBLE |
43
+ | Google Gemini | $10K credits | ❌ Not using | ⚠️ Could add |
44
+ | Community Choice | $1,000 | ⚠️ Possible | ✅ Better demo helps |
45
+ | **TOTAL POTENTIAL** | | ~$2,500 | **$8,500+** |
46
+
47
+ ---
48
+
49
+ ## Submission Checklist
50
+
51
+ - [ ] HuggingFace Space in `MCP-1st-Birthday` organization
52
+ - [ ] Track tags in Space README.md
53
+ - [ ] Social media post link (X, LinkedIn)
54
+ - [ ] Demo video (1-5 minutes)
55
+ - [ ] All team members registered
56
+ - [ ] Original work (Nov 14-30)
57
+
58
+ ---
59
+
60
+ ## Priority Integration Order
61
+
62
+ ### P0 - MUST HAVE (Required for Track 2)
63
+ 1. **MCP Server Wrapper** - Expose search tools as MCP servers
64
+ - See: `02_mcp_server_integration.md`
65
+
66
+ ### P1 - HIGH VALUE ($2,500 each)
67
+ 2. **Modal Integration** - Already have code, need to wire up
68
+ - See: `03_modal_integration.md`
69
+
70
+ ### P2 - NICE TO HAVE
71
+ 3. **Blaxel** - MCP hosting platform (if time permits)
72
+ 4. **Gemini API** - Add as LLM option for Google prize
73
+
74
+ ---
75
+
76
+ ## What MCP Actually Means for Us
77
+
78
+ MCP (Model Context Protocol) is Anthropic's standard for connecting AI to tools.
79
+
80
+ **Current state:**
81
+ - We have `PubMedTool`, `ClinicalTrialsTool`, `BioRxivTool`
82
+ - They're Python classes with `search()` methods
83
+
84
+ **What we need:**
85
+ - Wrap these as MCP servers
86
+ - So Claude Desktop, Cursor, or any MCP client can use them
87
+
88
+ **Why this matters:**
89
+ - Judges will test if our tools work with Claude Desktop
90
+ - No MCP = disqualified from Track 2
91
+
92
+ ---
93
+
94
+ ## Reference Links
95
+
96
+ - [Hackathon Page](https://huggingface.co/MCP-1st-Birthday)
97
+ - [MCP Documentation](https://modelcontextprotocol.io/)
98
+ - [Gradio MCP Guide](https://www.gradio.app/guides/building-mcp-server-with-gradio)
99
+ - [Discord: #agents-mcp-hackathon-winter25](https://discord.gg/huggingface)
docs/pending/02_mcp_server_integration.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MCP Server Integration
2
+
3
+ ## Priority: P0 - REQUIRED FOR TRACK 2
4
+
5
+ > **✅ STATUS: IMPLEMENTED** - See `src/mcp_tools.py` and `src/app.py`
6
+ > MCP endpoint: `/gradio_api/mcp/`
7
+
8
+ ---
9
+
10
+ ## What We Need
11
+
12
+ Expose our search tools as MCP servers so Claude Desktop/Cursor can use them.
13
+
14
+ ### Current Tools to Expose
15
+
16
+ | Tool | File | MCP Tool Name |
17
+ |------|------|---------------|
18
+ | PubMed Search | `src/tools/pubmed.py` | `search_pubmed` |
19
+ | ClinicalTrials Search | `src/tools/clinicaltrials.py` | `search_clinical_trials` |
20
+ | bioRxiv Search | `src/tools/biorxiv.py` | `search_biorxiv` |
21
+ | Combined Search | `src/tools/search_handler.py` | `search_all_sources` |
22
+
23
+ ---
24
+
25
+ ## Implementation Options
26
+
27
+ ### Option 1: Gradio MCP (Recommended)
28
+
29
+ Gradio 5.0+ can expose any Gradio app as an MCP server automatically.
30
+
31
+ ```python
32
+ # src/mcp_server.py
33
+ import gradio as gr
34
+ from src.tools.pubmed import PubMedTool
35
+ from src.tools.clinicaltrials import ClinicalTrialsTool
36
+ from src.tools.biorxiv import BioRxivTool
37
+
38
+ pubmed = PubMedTool()
39
+ trials = ClinicalTrialsTool()
40
+ biorxiv = BioRxivTool()
41
+
42
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
43
+ """Search PubMed for biomedical literature."""
44
+ results = await pubmed.search(query, max_results)
45
+ return "\n\n".join([f"**{e.citation.title}**\n{e.content}" for e in results])
46
+
47
+ async def search_clinical_trials(query: str, max_results: int = 10) -> str:
48
+ """Search ClinicalTrials.gov for clinical trial data."""
49
+ results = await trials.search(query, max_results)
50
+ return "\n\n".join([f"**{e.citation.title}**\n{e.content}" for e in results])
51
+
52
+ async def search_biorxiv(query: str, max_results: int = 10) -> str:
53
+ """Search bioRxiv/medRxiv for preprints."""
54
+ results = await biorxiv.search(query, max_results)
55
+ return "\n\n".join([f"**{e.citation.title}**\n{e.content}" for e in results])
56
+
57
+ # Create Gradio interface
58
+ demo = gr.Interface(
59
+ fn=[search_pubmed, search_clinical_trials, search_biorxiv],
60
+ inputs=[gr.Textbox(label="Query"), gr.Number(label="Max Results", value=10)],
61
+ outputs=gr.Textbox(label="Results"),
62
+ )
63
+
64
+ # Launch as MCP server
65
+ if __name__ == "__main__":
66
+ demo.launch(mcp_server=True) # Gradio 5.0+ feature
67
+ ```
68
+
69
+ ### Option 2: Native MCP SDK
70
+
71
+ Use the official MCP Python SDK:
72
+
73
+ ```bash
74
+ uv add mcp
75
+ ```
76
+
77
+ ```python
78
+ # src/mcp_server.py
79
+ from mcp.server import Server
80
+ from mcp.types import Tool, TextContent
81
+
82
+ from src.tools.pubmed import PubMedTool
83
+ from src.tools.clinicaltrials import ClinicalTrialsTool
84
+ from src.tools.biorxiv import BioRxivTool
85
+
86
+ server = Server("deepcritical-research")
87
+
88
+ @server.tool()
89
+ async def search_pubmed(query: str, max_results: int = 10) -> list[TextContent]:
90
+ """Search PubMed for biomedical literature on drug repurposing."""
91
+ tool = PubMedTool()
92
+ results = await tool.search(query, max_results)
93
+ return [TextContent(type="text", text=e.content) for e in results]
94
+
95
+ @server.tool()
96
+ async def search_clinical_trials(query: str, max_results: int = 10) -> list[TextContent]:
97
+ """Search ClinicalTrials.gov for clinical trials."""
98
+ tool = ClinicalTrialsTool()
99
+ results = await tool.search(query, max_results)
100
+ return [TextContent(type="text", text=e.content) for e in results]
101
+
102
+ @server.tool()
103
+ async def search_biorxiv(query: str, max_results: int = 10) -> list[TextContent]:
104
+ """Search bioRxiv/medRxiv for preprints (not peer-reviewed)."""
105
+ tool = BioRxivTool()
106
+ results = await tool.search(query, max_results)
107
+ return [TextContent(type="text", text=e.content) for e in results]
108
+
109
+ if __name__ == "__main__":
110
+ server.run()
111
+ ```
112
+
113
+ ---
114
+
115
+ ## Claude Desktop Configuration
116
+
117
+ After implementing, users add to `claude_desktop_config.json`:
118
+
119
+ ```json
120
+ {
121
+ "mcpServers": {
122
+ "deepcritical": {
123
+ "command": "uv",
124
+ "args": ["run", "python", "src/mcp_server.py"],
125
+ "cwd": "/path/to/DeepCritical-1"
126
+ }
127
+ }
128
+ }
129
+ ```
130
+
131
+ ---
132
+
133
+ ## Testing MCP Server
134
+
135
+ 1. Start the MCP server (via Gradio app):
136
+
137
+ ```bash
138
+ uv run python src/app.py
139
+ ```
140
+
141
+ 2. Check MCP schema:
142
+
143
+ ```bash
144
+ curl http://localhost:7860/gradio_api/mcp/schema | jq
145
+ ```
146
+
147
+ 3. Test with MCP Inspector:
148
+
149
+ ```bash
150
+ npx @anthropic/mcp-inspector http://localhost:7860/gradio_api/mcp/sse
151
+ ```
152
+
153
+ 4. Verify tools appear and work
154
+
155
+ ---
156
+
157
+ ## Demo Video Script
158
+
159
+ For the hackathon submission video:
160
+
161
+ 1. Show Claude Desktop with DeepCritical MCP tools
162
+ 2. Ask: "Search PubMed for metformin Alzheimer's"
163
+ 3. Show real results appearing
164
+ 4. Ask: "Now search clinical trials for the same"
165
+ 5. Show combined analysis
166
+
167
+ This proves MCP integration works.
168
+
169
+ ---
170
+
171
+ ## Files Created
172
+
173
+ - [x] `src/mcp_tools.py` - MCP tool wrapper functions
174
+ - [x] `src/app.py` - Gradio app with `mcp_server=True`
175
+ - [x] `tests/unit/test_mcp_tools.py` - Unit tests
176
+ - [x] `tests/integration/test_mcp_tools_live.py` - Integration tests
177
+ - [x] `README.md` - Updated with MCP usage instructions
docs/pending/03_modal_integration.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Modal Integration
2
+
3
+ ## Priority: P1 - HIGH VALUE ($2,500 Modal Innovation Award)
4
+
5
+ ---
6
+
7
+ ## What Modal Is For
8
+
9
+ Modal provides serverless GPU/CPU compute. For DeepCritical:
10
+
11
+ ### Current Use Case (Mario's Code)
12
+ - `src/tools/code_execution.py` - Run LLM-generated analysis code in sandboxes
13
+ - Scientific computing (pandas, scipy, numpy) in isolated containers
14
+
15
+ ### Potential Additional Use Cases
16
+
17
+ | Use Case | Benefit | Complexity |
18
+ |----------|---------|------------|
19
+ | Code Execution Sandbox | Run statistical analysis safely | ✅ Already built |
20
+ | LLM Inference | Run local models (no API costs) | Medium |
21
+ | Batch Processing | Process many papers in parallel | Medium |
22
+ | Embedding Generation | GPU-accelerated embeddings | Low |
23
+
24
+ ---
25
+
26
+ ## Current State
27
+
28
+ Mario implemented `src/tools/code_execution.py`:
29
+
30
+ ```python
31
+ # Already exists - ModalCodeExecutor
32
+ executor = get_code_executor()
33
+ result = executor.execute("""
34
+ import pandas as pd
35
+ import numpy as np
36
+ # LLM-generated statistical analysis
37
+ """)
38
+ ```
39
+
40
+ ### What's Missing
41
+
42
+ 1. **Not wired into the main pipeline** - The executor exists but isn't used
43
+ 2. **No Modal tokens configured** - Needs MODAL_TOKEN_ID/MODAL_TOKEN_SECRET
44
+ 3. **No demo showing it works** - Judges need to see it
45
+
46
+ ---
47
+
48
+ ## Integration Plan
49
+
50
+ ### Step 1: Wire Into Agent Pipeline
51
+
52
+ Add a `StatisticalAnalyzer` service that uses Modal:
53
+
54
+ ```python
55
+ # src/services/statistical_analyzer.py
56
+ import asyncio
57
+ from src.tools.code_execution import get_code_executor
58
+
59
+ class StatisticalAnalyzer:
60
+ """Run statistical analysis on evidence using Modal sandbox."""
61
+
62
+ async def analyze(self, evidence: list[Evidence], query: str) -> str:
63
+ # 1. LLM generates analysis code
64
+ code = await self._generate_analysis_code(evidence, query)
65
+
66
+ # 2. Execute in Modal sandbox (run sync executor in thread pool)
67
+ executor = get_code_executor()
68
+ loop = asyncio.get_event_loop()
69
+ result = await loop.run_in_executor(None, executor.execute, code)
70
+
71
+ # 3. Return results
72
+ return result["stdout"]
73
+ ```
74
+
75
+ ### Step 2: Add to Orchestrator
76
+
77
+ ```python
78
+ # In orchestrator, after gathering evidence:
79
+ if settings.enable_modal_analysis:
80
+ analysis_agent = AnalysisAgent()
81
+ stats_results = await analysis_agent.analyze(evidence, query)
82
+ ```
83
+
84
+ ### Step 3: Create Demo
85
+
86
+ ```python
87
+ # examples/modal_demo/run_analysis.py
88
+ """Demo: Modal-powered statistical analysis of drug evidence."""
89
+
90
+ # Show:
91
+ # 1. Gather evidence from PubMed
92
+ # 2. Generate analysis code with LLM
93
+ # 3. Execute in Modal sandbox
94
+ # 4. Return statistical insights
95
+ ```
96
+
97
+ ---
98
+
99
+ ## Modal Setup
100
+
101
+ ### 1. Install Modal CLI
102
+ ```bash
103
+ pip install modal
104
+ modal setup # Authenticates with Modal
105
+ ```
106
+
107
+ ### 2. Set Environment Variables
108
+ ```bash
109
+ # In .env
110
+ MODAL_TOKEN_ID=your-token-id
111
+ MODAL_TOKEN_SECRET=your-token-secret
112
+ ```
113
+
114
+ ### 3. Deploy (Optional)
115
+ ```bash
116
+ modal deploy src/tools/code_execution.py
117
+ ```
118
+
119
+ ---
120
+
121
+ ## What to Show Judges
122
+
123
+ For the Modal Innovation Award ($2,500):
124
+
125
+ 1. **Sandbox Isolation** - Code runs in container, not local
126
+ 2. **Scientific Computing** - Real pandas/scipy analysis
127
+ 3. **Safety** - Can't access local filesystem
128
+ 4. **Speed** - Modal's fast cold starts
129
+
130
+ ### Demo Script
131
+
132
+ ```bash
133
+ # Run the Modal verification script
134
+ uv run python examples/modal_demo/verify_sandbox.py
135
+ ```
136
+
137
+ This proves code runs in Modal, not locally.
138
+
139
+ ---
140
+
141
+ ## Files to Update
142
+
143
+ - [ ] Wire `code_execution.py` into pipeline
144
+ - [ ] Create `src/agents/analysis_agent.py`
145
+ - [ ] Update `examples/modal_demo/` with working demo
146
+ - [ ] Add Modal setup to README
147
+ - [ ] Test with real Modal account
148
+
149
+ ---
150
+
151
+ ## Cost Estimate
152
+
153
+ Modal pricing for our use case:
154
+ - CPU sandbox: ~$0.0001 per execution
155
+ - For demo/judging: < $1 total
156
+ - Free tier: 30 hours/month
157
+
158
+ Not a cost concern.
pyproject.toml CHANGED
@@ -17,7 +17,7 @@ dependencies = [
17
  "beautifulsoup4>=4.12", # HTML parsing
18
  "xmltodict>=0.13", # PubMed XML -> dict
19
  # UI
20
- "gradio>=5.0", # Chat interface
21
  # Utils
22
  "python-dotenv>=1.0", # .env loading
23
  "tenacity>=8.2", # Retry logic
 
17
  "beautifulsoup4>=4.12", # HTML parsing
18
  "xmltodict>=0.13", # PubMed XML -> dict
19
  # UI
20
+ "gradio[mcp]>=5.0.0", # Chat interface
21
  # Utils
22
  "python-dotenv>=1.0", # .env loading
23
  "tenacity>=8.2", # Retry logic
src/app.py CHANGED
@@ -1,4 +1,4 @@
1
- """Gradio UI for DeepCritical agent."""
2
 
3
  import os
4
  from collections.abc import AsyncGenerator
@@ -7,6 +7,12 @@ from typing import Any
7
  import gradio as gr
8
 
9
  from src.agent_factory.judges import JudgeHandler, MockJudgeHandler
 
 
 
 
 
 
10
  from src.orchestrator_factory import create_orchestrator
11
  from src.tools.biorxiv import BioRxivTool
12
  from src.tools.clinicaltrials import ClinicalTrialsTool
@@ -115,10 +121,10 @@ async def research_agent(
115
 
116
  def create_demo() -> Any:
117
  """
118
- Create the Gradio demo interface.
119
 
120
  Returns:
121
- Configured Gradio Blocks interface
122
  """
123
  with gr.Blocks(
124
  title="DeepCritical - Drug Repurposing Research Agent",
@@ -137,9 +143,10 @@ def create_demo() -> Any:
137
  - "What existing medications show promise for Long COVID?"
138
  """)
139
 
 
140
  gr.ChatInterface(
141
  fn=research_agent,
142
- type="messages",
143
  title="",
144
  examples=[
145
  "What drugs could be repurposed for Alzheimer's disease?",
@@ -157,24 +164,74 @@ def create_demo() -> Any:
157
  ],
158
  )
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  gr.Markdown("""
161
  ---
162
  **Note**: This is a research tool and should not be used for medical decisions.
163
  Always consult healthcare professionals for medical advice.
164
 
165
- Built with 🤖 PydanticAI + 🔬 PubMed, ClinicalTrials.gov & bioRxiv
 
 
166
  """)
167
 
168
  return demo
169
 
170
 
171
  def main() -> None:
172
- """Run the Gradio app."""
173
  demo = create_demo()
174
  demo.launch(
175
  server_name="0.0.0.0",
176
  server_port=7860,
177
  share=False,
 
178
  )
179
 
180
 
 
1
+ """Gradio UI for DeepCritical agent with MCP server support."""
2
 
3
  import os
4
  from collections.abc import AsyncGenerator
 
7
  import gradio as gr
8
 
9
  from src.agent_factory.judges import JudgeHandler, MockJudgeHandler
10
+ from src.mcp_tools import (
11
+ search_all_sources,
12
+ search_biorxiv,
13
+ search_clinical_trials,
14
+ search_pubmed,
15
+ )
16
  from src.orchestrator_factory import create_orchestrator
17
  from src.tools.biorxiv import BioRxivTool
18
  from src.tools.clinicaltrials import ClinicalTrialsTool
 
121
 
122
  def create_demo() -> Any:
123
  """
124
+ Create the Gradio demo interface with MCP support.
125
 
126
  Returns:
127
+ Configured Gradio Blocks interface with MCP server enabled
128
  """
129
  with gr.Blocks(
130
  title="DeepCritical - Drug Repurposing Research Agent",
 
143
  - "What existing medications show promise for Long COVID?"
144
  """)
145
 
146
+ # Main chat interface (existing)
147
  gr.ChatInterface(
148
  fn=research_agent,
149
+ type="messages", # type: ignore
150
  title="",
151
  examples=[
152
  "What drugs could be repurposed for Alzheimer's disease?",
 
164
  ],
165
  )
166
 
167
+ # MCP Tool Interfaces (exposed via MCP protocol)
168
+ gr.Markdown("---\n## MCP Tools (Also Available via Claude Desktop)")
169
+
170
+ with gr.Tab("PubMed Search"):
171
+ gr.Interface(
172
+ fn=search_pubmed,
173
+ inputs=[
174
+ gr.Textbox(label="Query", placeholder="metformin alzheimer"),
175
+ gr.Slider(1, 50, value=10, step=1, label="Max Results"),
176
+ ],
177
+ outputs=gr.Markdown(label="Results"),
178
+ api_name="search_pubmed",
179
+ )
180
+
181
+ with gr.Tab("Clinical Trials"):
182
+ gr.Interface(
183
+ fn=search_clinical_trials,
184
+ inputs=[
185
+ gr.Textbox(label="Query", placeholder="diabetes phase 3"),
186
+ gr.Slider(1, 50, value=10, step=1, label="Max Results"),
187
+ ],
188
+ outputs=gr.Markdown(label="Results"),
189
+ api_name="search_clinical_trials",
190
+ )
191
+
192
+ with gr.Tab("Preprints"):
193
+ gr.Interface(
194
+ fn=search_biorxiv,
195
+ inputs=[
196
+ gr.Textbox(label="Query", placeholder="long covid treatment"),
197
+ gr.Slider(1, 50, value=10, step=1, label="Max Results"),
198
+ ],
199
+ outputs=gr.Markdown(label="Results"),
200
+ api_name="search_biorxiv",
201
+ )
202
+
203
+ with gr.Tab("Search All"):
204
+ gr.Interface(
205
+ fn=search_all_sources,
206
+ inputs=[
207
+ gr.Textbox(label="Query", placeholder="metformin cancer"),
208
+ gr.Slider(1, 20, value=5, step=1, label="Max Per Source"),
209
+ ],
210
+ outputs=gr.Markdown(label="Results"),
211
+ api_name="search_all",
212
+ )
213
+
214
  gr.Markdown("""
215
  ---
216
  **Note**: This is a research tool and should not be used for medical decisions.
217
  Always consult healthcare professionals for medical advice.
218
 
219
+ Built with PydanticAI + PubMed, ClinicalTrials.gov & bioRxiv
220
+
221
+ **MCP Server**: Available at `/gradio_api/mcp/` for Claude Desktop integration
222
  """)
223
 
224
  return demo
225
 
226
 
227
  def main() -> None:
228
+ """Run the Gradio app with MCP server enabled."""
229
  demo = create_demo()
230
  demo.launch(
231
  server_name="0.0.0.0",
232
  server_port=7860,
233
  share=False,
234
+ mcp_server=True, # Enable MCP server
235
  )
236
 
237
 
src/mcp_tools.py ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MCP tool wrappers for DeepCritical search tools.
2
+
3
+ These functions expose our search tools via MCP protocol.
4
+ Each function follows the MCP tool contract:
5
+ - Full type hints
6
+ - Google-style docstrings with Args section
7
+ - Formatted string returns
8
+ """
9
+
10
+ from src.tools.biorxiv import BioRxivTool
11
+ from src.tools.clinicaltrials import ClinicalTrialsTool
12
+ from src.tools.pubmed import PubMedTool
13
+
14
+ # Singleton instances (avoid recreating on each call)
15
+ _pubmed = PubMedTool()
16
+ _trials = ClinicalTrialsTool()
17
+ _biorxiv = BioRxivTool()
18
+
19
+
20
+ async def search_pubmed(query: str, max_results: int = 10) -> str:
21
+ """Search PubMed for peer-reviewed biomedical literature.
22
+
23
+ Searches NCBI PubMed database for scientific papers matching your query.
24
+ Returns titles, authors, abstracts, and citation information.
25
+
26
+ Args:
27
+ query: Search query (e.g., "metformin alzheimer", "drug repurposing cancer")
28
+ max_results: Maximum results to return (1-50, default 10)
29
+
30
+ Returns:
31
+ Formatted search results with paper titles, authors, dates, and abstracts
32
+ """
33
+ max_results = max(1, min(50, max_results)) # Clamp to valid range
34
+
35
+ results = await _pubmed.search(query, max_results)
36
+
37
+ if not results:
38
+ return f"No PubMed results found for: {query}"
39
+
40
+ formatted = [f"## PubMed Results for: {query}\n"]
41
+ for i, evidence in enumerate(results, 1):
42
+ formatted.append(f"### {i}. {evidence.citation.title}")
43
+ formatted.append(f"**Authors**: {', '.join(evidence.citation.authors[:3])}")
44
+ formatted.append(f"**Date**: {evidence.citation.date}")
45
+ formatted.append(f"**URL**: {evidence.citation.url}")
46
+ formatted.append(f"\n{evidence.content}\n")
47
+
48
+ return "\n".join(formatted)
49
+
50
+
51
+ async def search_clinical_trials(query: str, max_results: int = 10) -> str:
52
+ """Search ClinicalTrials.gov for clinical trial data.
53
+
54
+ Searches the ClinicalTrials.gov database for trials matching your query.
55
+ Returns trial titles, phases, status, conditions, and interventions.
56
+
57
+ Args:
58
+ query: Search query (e.g., "metformin alzheimer", "diabetes phase 3")
59
+ max_results: Maximum results to return (1-50, default 10)
60
+
61
+ Returns:
62
+ Formatted clinical trial information with NCT IDs, phases, and status
63
+ """
64
+ max_results = max(1, min(50, max_results))
65
+
66
+ results = await _trials.search(query, max_results)
67
+
68
+ if not results:
69
+ return f"No clinical trials found for: {query}"
70
+
71
+ formatted = [f"## Clinical Trials for: {query}\n"]
72
+ for i, evidence in enumerate(results, 1):
73
+ formatted.append(f"### {i}. {evidence.citation.title}")
74
+ formatted.append(f"**URL**: {evidence.citation.url}")
75
+ formatted.append(f"**Date**: {evidence.citation.date}")
76
+ formatted.append(f"\n{evidence.content}\n")
77
+
78
+ return "\n".join(formatted)
79
+
80
+
81
+ async def search_biorxiv(query: str, max_results: int = 10) -> str:
82
+ """Search bioRxiv/medRxiv for preprint research.
83
+
84
+ Searches bioRxiv and medRxiv preprint servers for cutting-edge research.
85
+ Note: Preprints are NOT peer-reviewed but contain the latest findings.
86
+
87
+ Args:
88
+ query: Search query (e.g., "metformin neuroprotection", "long covid treatment")
89
+ max_results: Maximum results to return (1-50, default 10)
90
+
91
+ Returns:
92
+ Formatted preprint results with titles, authors, and abstracts
93
+ """
94
+ max_results = max(1, min(50, max_results))
95
+
96
+ results = await _biorxiv.search(query, max_results)
97
+
98
+ if not results:
99
+ return f"No bioRxiv/medRxiv preprints found for: {query}"
100
+
101
+ formatted = [f"## Preprint Results for: {query}\n"]
102
+ for i, evidence in enumerate(results, 1):
103
+ formatted.append(f"### {i}. {evidence.citation.title}")
104
+ formatted.append(f"**Authors**: {', '.join(evidence.citation.authors[:3])}")
105
+ formatted.append(f"**Date**: {evidence.citation.date}")
106
+ formatted.append(f"**URL**: {evidence.citation.url}")
107
+ formatted.append(f"\n{evidence.content}\n")
108
+
109
+ return "\n".join(formatted)
110
+
111
+
112
+ async def search_all_sources(query: str, max_per_source: int = 5) -> str:
113
+ """Search all biomedical sources simultaneously.
114
+
115
+ Performs parallel search across PubMed, ClinicalTrials.gov, and bioRxiv.
116
+ This is the most comprehensive search option for drug repurposing research.
117
+
118
+ Args:
119
+ query: Search query (e.g., "metformin alzheimer", "aspirin cancer prevention")
120
+ max_per_source: Maximum results per source (1-20, default 5)
121
+
122
+ Returns:
123
+ Combined results from all sources with source labels
124
+ """
125
+ import asyncio
126
+
127
+ max_per_source = max(1, min(20, max_per_source))
128
+
129
+ # Run all searches in parallel
130
+ pubmed_task = search_pubmed(query, max_per_source)
131
+ trials_task = search_clinical_trials(query, max_per_source)
132
+ biorxiv_task = search_biorxiv(query, max_per_source)
133
+
134
+ pubmed_results, trials_results, biorxiv_results = await asyncio.gather(
135
+ pubmed_task, trials_task, biorxiv_task, return_exceptions=True
136
+ )
137
+
138
+ formatted = [f"# Comprehensive Search: {query}\n"]
139
+
140
+ # Add each result section (handle exceptions gracefully)
141
+ if isinstance(pubmed_results, str):
142
+ formatted.append(pubmed_results)
143
+ else:
144
+ formatted.append(f"## PubMed\n*Error: {pubmed_results}*\n")
145
+
146
+ if isinstance(trials_results, str):
147
+ formatted.append(trials_results)
148
+ else:
149
+ formatted.append(f"## Clinical Trials\n*Error: {trials_results}*\n")
150
+
151
+ if isinstance(biorxiv_results, str):
152
+ formatted.append(biorxiv_results)
153
+ else:
154
+ formatted.append(f"## Preprints\n*Error: {biorxiv_results}*\n")
155
+
156
+ return "\n---\n".join(formatted)
tests/integration/test_mcp_tools_live.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Integration tests for MCP tool wrappers with live API calls."""
2
+
3
+ import pytest
4
+
5
+
6
+ class TestMCPToolsLive:
7
+ """Integration tests for MCP tools against live APIs (PubMed, etc.)."""
8
+
9
+ @pytest.mark.integration
10
+ @pytest.mark.asyncio
11
+ async def test_mcp_tools_work_end_to_end(self) -> None:
12
+ """Test that MCP tools execute real searches."""
13
+ from src.mcp_tools import search_pubmed
14
+
15
+ result = await search_pubmed("metformin diabetes", 3)
16
+
17
+ assert isinstance(result, str)
18
+ assert "PubMed Results" in result
19
+ # Should have actual content (not just "no results")
20
+ # Typical queries should return something.
21
+ # The wrapper returns "No PubMed results found" string if empty.
22
+
23
+ if "No PubMed results found" not in result:
24
+ assert len(result) > 10
tests/unit/test_mcp_tools.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Unit tests for MCP tool wrappers."""
2
+
3
+ from unittest.mock import AsyncMock, patch
4
+
5
+ import pytest
6
+
7
+ from src.mcp_tools import (
8
+ search_all_sources,
9
+ search_biorxiv,
10
+ search_clinical_trials,
11
+ search_pubmed,
12
+ )
13
+ from src.utils.models import Citation, Evidence
14
+
15
+
16
+ @pytest.fixture
17
+ def mock_evidence() -> Evidence:
18
+ """Sample evidence for testing."""
19
+ return Evidence(
20
+ content="Metformin shows neuroprotective effects in preclinical models.",
21
+ citation=Citation(
22
+ source="pubmed",
23
+ title="Metformin and Alzheimer's Disease",
24
+ url="https://pubmed.ncbi.nlm.nih.gov/12345678/",
25
+ date="2024-01-15",
26
+ authors=["Smith J", "Jones M", "Brown K"],
27
+ ),
28
+ relevance=0.85,
29
+ )
30
+
31
+
32
+ class TestSearchPubMed:
33
+ """Tests for search_pubmed MCP tool."""
34
+
35
+ @pytest.mark.asyncio
36
+ async def test_returns_formatted_string(self, mock_evidence: Evidence) -> None:
37
+ """Should return formatted markdown string."""
38
+ with patch("src.mcp_tools._pubmed") as mock_tool:
39
+ mock_tool.search = AsyncMock(return_value=[mock_evidence])
40
+
41
+ result = await search_pubmed("metformin alzheimer", 10)
42
+
43
+ assert isinstance(result, str)
44
+ assert "PubMed Results" in result
45
+ assert "Metformin and Alzheimer's Disease" in result
46
+ assert "Smith J" in result
47
+
48
+ @pytest.mark.asyncio
49
+ async def test_clamps_max_results(self) -> None:
50
+ """Should clamp max_results to valid range (1-50)."""
51
+ with patch("src.mcp_tools._pubmed") as mock_tool:
52
+ mock_tool.search = AsyncMock(return_value=[])
53
+
54
+ # Test lower bound
55
+ await search_pubmed("test", 0)
56
+ mock_tool.search.assert_called_with("test", 1)
57
+
58
+ # Test upper bound
59
+ await search_pubmed("test", 100)
60
+ mock_tool.search.assert_called_with("test", 50)
61
+
62
+ @pytest.mark.asyncio
63
+ async def test_handles_no_results(self) -> None:
64
+ """Should return appropriate message when no results."""
65
+ with patch("src.mcp_tools._pubmed") as mock_tool:
66
+ mock_tool.search = AsyncMock(return_value=[])
67
+
68
+ result = await search_pubmed("xyznonexistent", 10)
69
+
70
+ assert "No PubMed results found" in result
71
+
72
+
73
+ class TestSearchClinicalTrials:
74
+ """Tests for search_clinical_trials MCP tool."""
75
+
76
+ @pytest.mark.asyncio
77
+ async def test_returns_formatted_string(self, mock_evidence: Evidence) -> None:
78
+ """Should return formatted markdown string."""
79
+ mock_evidence.citation.source = "clinicaltrials" # type: ignore
80
+
81
+ with patch("src.mcp_tools._trials") as mock_tool:
82
+ mock_tool.search = AsyncMock(return_value=[mock_evidence])
83
+
84
+ result = await search_clinical_trials("diabetes", 10)
85
+
86
+ assert isinstance(result, str)
87
+ assert "Clinical Trials" in result
88
+
89
+
90
+ class TestSearchBiorxiv:
91
+ """Tests for search_biorxiv MCP tool."""
92
+
93
+ @pytest.mark.asyncio
94
+ async def test_returns_formatted_string(self, mock_evidence: Evidence) -> None:
95
+ """Should return formatted markdown string."""
96
+ mock_evidence.citation.source = "biorxiv" # type: ignore
97
+
98
+ with patch("src.mcp_tools._biorxiv") as mock_tool:
99
+ mock_tool.search = AsyncMock(return_value=[mock_evidence])
100
+
101
+ result = await search_biorxiv("preprint search", 10)
102
+
103
+ assert isinstance(result, str)
104
+ assert "Preprint Results" in result
105
+
106
+
107
+ class TestSearchAllSources:
108
+ """Tests for search_all_sources MCP tool."""
109
+
110
+ @pytest.mark.asyncio
111
+ async def test_combines_all_sources(self, mock_evidence: Evidence) -> None:
112
+ """Should combine results from all sources."""
113
+ with (
114
+ patch("src.mcp_tools.search_pubmed", new_callable=AsyncMock) as mock_pubmed,
115
+ patch("src.mcp_tools.search_clinical_trials", new_callable=AsyncMock) as mock_trials,
116
+ patch("src.mcp_tools.search_biorxiv", new_callable=AsyncMock) as mock_biorxiv,
117
+ ):
118
+ mock_pubmed.return_value = "## PubMed Results"
119
+ mock_trials.return_value = "## Clinical Trials"
120
+ mock_biorxiv.return_value = "## Preprints"
121
+
122
+ result = await search_all_sources("metformin", 5)
123
+
124
+ assert "Comprehensive Search" in result
125
+ assert "PubMed" in result
126
+ assert "Clinical Trials" in result
127
+ assert "Preprints" in result
128
+
129
+ @pytest.mark.asyncio
130
+ async def test_handles_partial_failures(self) -> None:
131
+ """Should handle partial failures gracefully."""
132
+ with (
133
+ patch("src.mcp_tools.search_pubmed", new_callable=AsyncMock) as mock_pubmed,
134
+ patch("src.mcp_tools.search_clinical_trials", new_callable=AsyncMock) as mock_trials,
135
+ patch("src.mcp_tools.search_biorxiv", new_callable=AsyncMock) as mock_biorxiv,
136
+ ):
137
+ mock_pubmed.return_value = "## PubMed Results"
138
+ mock_trials.side_effect = Exception("API Error")
139
+ mock_biorxiv.return_value = "## Preprints"
140
+
141
+ result = await search_all_sources("metformin", 5)
142
+
143
+ # Should still contain working sources
144
+ assert "PubMed" in result
145
+ assert "Preprints" in result
146
+ # Should show error for failed source
147
+ assert "Error" in result
148
+
149
+
150
+ class TestMCPDocstrings:
151
+ """Tests that docstrings follow MCP format."""
152
+
153
+ def test_search_pubmed_has_args_section(self) -> None:
154
+ """Docstring must have Args section for MCP schema generation."""
155
+ assert search_pubmed.__doc__ is not None
156
+ assert "Args:" in search_pubmed.__doc__
157
+ assert "query:" in search_pubmed.__doc__
158
+ assert "max_results:" in search_pubmed.__doc__
159
+ assert "Returns:" in search_pubmed.__doc__
160
+
161
+ def test_search_clinical_trials_has_args_section(self) -> None:
162
+ """Docstring must have Args section for MCP schema generation."""
163
+ assert search_clinical_trials.__doc__ is not None
164
+ assert "Args:" in search_clinical_trials.__doc__
165
+
166
+ def test_search_biorxiv_has_args_section(self) -> None:
167
+ """Docstring must have Args section for MCP schema generation."""
168
+ assert search_biorxiv.__doc__ is not None
169
+ assert "Args:" in search_biorxiv.__doc__
170
+
171
+ def test_search_all_sources_has_args_section(self) -> None:
172
+ """Docstring must have Args section for MCP schema generation."""
173
+ assert search_all_sources.__doc__ is not None
174
+ assert "Args:" in search_all_sources.__doc__
175
+
176
+
177
+ class TestMCPTypeHints:
178
+ """Tests that type hints are complete for MCP."""
179
+
180
+ def test_search_pubmed_type_hints(self) -> None:
181
+ """All parameters and return must have type hints."""
182
+ import inspect
183
+
184
+ sig = inspect.signature(search_pubmed)
185
+
186
+ # Check parameter hints
187
+ assert sig.parameters["query"].annotation is str
188
+ assert sig.parameters["max_results"].annotation is int
189
+
190
+ # Check return hint
191
+ assert sig.return_annotation is str
192
+
193
+ def test_search_clinical_trials_type_hints(self) -> None:
194
+ """All parameters and return must have type hints."""
195
+ import inspect
196
+
197
+ sig = inspect.signature(search_clinical_trials)
198
+ assert sig.parameters["query"].annotation is str
199
+ assert sig.parameters["max_results"].annotation is int
200
+ assert sig.return_annotation is str
uv.lock CHANGED
@@ -1063,7 +1063,7 @@ source = { editable = "." }
1063
  dependencies = [
1064
  { name = "anthropic" },
1065
  { name = "beautifulsoup4" },
1066
- { name = "gradio" },
1067
  { name = "httpx" },
1068
  { name = "openai" },
1069
  { name = "pydantic" },
@@ -1111,7 +1111,7 @@ requires-dist = [
1111
  { name = "beautifulsoup4", specifier = ">=4.12" },
1112
  { name = "chromadb", marker = "extra == 'embeddings'", specifier = ">=0.4.0" },
1113
  { name = "chromadb", marker = "extra == 'modal'", specifier = ">=0.4.0" },
1114
- { name = "gradio", specifier = ">=5.0" },
1115
  { name = "httpx", specifier = ">=0.27" },
1116
  { name = "llama-index", marker = "extra == 'modal'", specifier = ">=0.11.0" },
1117
  { name = "llama-index-embeddings-openai", marker = "extra == 'modal'" },
@@ -1568,7 +1568,7 @@ wheels = [
1568
 
1569
  [[package]]
1570
  name = "gradio"
1571
- version = "5.50.0"
1572
  source = { registry = "https://pypi.org/simple" }
1573
  dependencies = [
1574
  { name = "aiofiles" },
@@ -1592,7 +1592,6 @@ dependencies = [
1592
  { name = "pydub" },
1593
  { name = "python-multipart" },
1594
  { name = "pyyaml" },
1595
- { name = "ruff" },
1596
  { name = "safehttpx" },
1597
  { name = "semantic-version" },
1598
  { name = "starlette" },
@@ -1601,13 +1600,20 @@ dependencies = [
1601
  { name = "typing-extensions" },
1602
  { name = "uvicorn" },
1603
  ]
 
1604
  wheels = [
1605
- { url = "https://files.pythonhosted.org/packages/22/04/8daf96bd6d2470f03e2a15a9fc900c7ecf6549619173f16c5944c7ec15a7/gradio-5.50.0-py3-none-any.whl", hash = "sha256:d06770d57cdda9b703ef9cf767ac93a890a0e12d82679a310eef74203a3673f4", size = 63530991 },
 
 
 
 
 
 
1606
  ]
1607
 
1608
  [[package]]
1609
  name = "gradio-client"
1610
- version = "1.14.0"
1611
  source = { registry = "https://pypi.org/simple" }
1612
  dependencies = [
1613
  { name = "fsspec" },
@@ -1615,10 +1621,10 @@ dependencies = [
1615
  { name = "huggingface-hub" },
1616
  { name = "packaging" },
1617
  { name = "typing-extensions" },
1618
- { name = "websockets" },
1619
  ]
 
1620
  wheels = [
1621
- { url = "https://files.pythonhosted.org/packages/be/8a/f2a47134c5b5a7f3bad27eae749589a80d81efaaad8f59af47c136712bf6/gradio_client-1.14.0-py3-none-any.whl", hash = "sha256:9a2f5151978411e0f8b55a2d38cddd0a94491851149d14db4af96f5a09774825", size = 325555 },
1622
  ]
1623
 
1624
  [[package]]
 
1063
  dependencies = [
1064
  { name = "anthropic" },
1065
  { name = "beautifulsoup4" },
1066
+ { name = "gradio", extra = ["mcp"] },
1067
  { name = "httpx" },
1068
  { name = "openai" },
1069
  { name = "pydantic" },
 
1111
  { name = "beautifulsoup4", specifier = ">=4.12" },
1112
  { name = "chromadb", marker = "extra == 'embeddings'", specifier = ">=0.4.0" },
1113
  { name = "chromadb", marker = "extra == 'modal'", specifier = ">=0.4.0" },
1114
+ { name = "gradio", extras = ["mcp"], specifier = ">=5.0.0" },
1115
  { name = "httpx", specifier = ">=0.27" },
1116
  { name = "llama-index", marker = "extra == 'modal'", specifier = ">=0.11.0" },
1117
  { name = "llama-index-embeddings-openai", marker = "extra == 'modal'" },
 
1568
 
1569
  [[package]]
1570
  name = "gradio"
1571
+ version = "6.0.1"
1572
  source = { registry = "https://pypi.org/simple" }
1573
  dependencies = [
1574
  { name = "aiofiles" },
 
1592
  { name = "pydub" },
1593
  { name = "python-multipart" },
1594
  { name = "pyyaml" },
 
1595
  { name = "safehttpx" },
1596
  { name = "semantic-version" },
1597
  { name = "starlette" },
 
1600
  { name = "typing-extensions" },
1601
  { name = "uvicorn" },
1602
  ]
1603
+ sdist = { url = "https://files.pythonhosted.org/packages/65/13/f2bfe1237b8700f63e21c5e39f2843ac8346f7ba4525b582f30f40249863/gradio-6.0.1.tar.gz", hash = "sha256:5d02e6ac34c67aea26b938b8628c8f9f504871392e71f2db559ab8d6799bdf69", size = 36440914 }
1604
  wheels = [
1605
+ { url = "https://files.pythonhosted.org/packages/09/21/27ae5f4b2191a5d58707fc610e67453781a2b948a675a7cf06c99497ffa1/gradio-6.0.1-py3-none-any.whl", hash = "sha256:0f98dc8b414a3f3773cbf3caf5a354507c8ae309ed8266e2f30ca9fa53f379b8", size = 21559963 },
1606
+ ]
1607
+
1608
+ [package.optional-dependencies]
1609
+ mcp = [
1610
+ { name = "mcp" },
1611
+ { name = "pydantic" },
1612
  ]
1613
 
1614
  [[package]]
1615
  name = "gradio-client"
1616
+ version = "2.0.0"
1617
  source = { registry = "https://pypi.org/simple" }
1618
  dependencies = [
1619
  { name = "fsspec" },
 
1621
  { name = "huggingface-hub" },
1622
  { name = "packaging" },
1623
  { name = "typing-extensions" },
 
1624
  ]
1625
+ sdist = { url = "https://files.pythonhosted.org/packages/cf/0a/906062fe0577c62ea6e14044ba74268ff9266fdc75d0e69257bddb7400b3/gradio_client-2.0.0.tar.gz", hash = "sha256:56b462183cb8741bd3e69b21db7d3b62c5abb03c2c2bb925223f1eb18f950e89", size = 315906 }
1626
  wheels = [
1627
+ { url = "https://files.pythonhosted.org/packages/07/5b/789403564754f1eba0273400c1cea2c155f984d82458279154977a088509/gradio_client-2.0.0-py3-none-any.whl", hash = "sha256:77bedf20edcc232d8e7986c1a22165b2bbca1c7c7df10ba808a093d5180dae18", size = 315180 },
1628
  ]
1629
 
1630
  [[package]]