VibecoderMcSwaggins commited on
Commit
0d84878
Β·
1 Parent(s): e16b9e6

docs: enhance Phase 13 Modal integration documentation

Browse files

- Updated the documentation to reflect the new `StatisticalAnalyzer` service, which decouples Modal execution from the `agent_framework`, ensuring no dependencies for the simple orchestrator.
- Revised the flow diagrams to illustrate the integration of the `StatisticalAnalyzer` and its role in the analysis phase.
- Added detailed sections on the implementation, configuration updates, and integration points for the new service.
- Included unit and integration tests for the `StatisticalAnalyzer`, ensuring functionality without the `agent_framework`.
- Updated demo scripts to showcase the new analysis capabilities and verification of Modal sandbox execution.

Files modified:
- docs/implementation/13_phase_modal_integration.md
- src/services/statistical_analyzer.py
- src/orchestrator.py
- src/agents/analysis_agent.py
- src/mcp_tools.py
- examples/modal_demo/run_analysis.py
- examples/modal_demo/verify_sandbox.py
- tests/unit/services/test_statistical_analyzer.py
- tests/integration/test_modal.py

docs/implementation/13_phase_modal_integration.md CHANGED
@@ -25,21 +25,66 @@ Mario already implemented `src/tools/code_execution.py`:
25
 
26
  ### What's Missing
27
 
28
- ```
29
  Current Flow:
30
  User Query β†’ Orchestrator β†’ Search β†’ Judge β†’ [Report] β†’ Done
31
 
32
  With Modal:
33
- User Query β†’ Orchestrator β†’ Search β†’ Judge β†’ [Hypothesis] β†’ [Analysis*] β†’ Report β†’ Done
34
- ↓
35
- Modal Sandbox Execution
36
  ```
37
 
38
  *The AnalysisAgent exists but is NOT called by either orchestrator.
39
 
40
  ---
41
 
42
- ## 2. Prize Opportunity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ### Modal Innovation Award: $2,500
45
 
@@ -57,24 +102,19 @@ code = """
57
  import pandas as pd
58
  import scipy.stats as stats
59
 
60
- # Analyze extracted metrics from evidence
61
  data = pd.DataFrame({
62
  'study': ['Study1', 'Study2', 'Study3'],
63
  'effect_size': [0.45, 0.52, 0.38],
64
  'sample_size': [120, 85, 200]
65
  })
66
 
67
- # Meta-analysis statistics
68
  weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum()
69
  t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0)
70
 
71
  print(f"Weighted Effect Size: {weighted_mean:.3f}")
72
  print(f"P-value: {p_value:.4f}")
73
 
74
- if p_value < 0.05:
75
- result = "SUPPORTED"
76
- else:
77
- result = "INCONCLUSIVE"
78
  """
79
 
80
  # Executed SAFELY in Modal sandbox
@@ -84,19 +124,19 @@ output = executor.execute(code) # Runs in isolated container!
84
 
85
  ---
86
 
87
- ## 3. Technical Specification
88
 
89
- ### 3.1 Dependencies (Already Present)
90
 
91
  ```toml
92
- # pyproject.toml - already has Modal
93
- dependencies = [
94
- "modal>=0.63.0",
95
- # ...
96
- ]
97
  ```
98
 
99
- ### 3.2 Environment Variables
100
 
101
  ```bash
102
  # .env
@@ -104,20 +144,21 @@ MODAL_TOKEN_ID=your-token-id
104
  MODAL_TOKEN_SECRET=your-token-secret
105
  ```
106
 
107
- ### 3.3 Integration Points
108
 
109
  | Integration Point | File | Change Required |
110
  |-------------------|------|-----------------|
111
- | Simple Orchestrator | `src/orchestrator.py` | Add `AnalysisAgent` call |
112
- | Magentic Orchestrator | `src/orchestrator_magentic.py` | Add `AnalysisAgent` participant |
113
- | Gradio UI | `src/app.py` | Add toggle for analysis mode |
114
  | Config | `src/utils/config.py` | Add `enable_modal_analysis` setting |
 
 
115
 
116
  ---
117
 
118
- ## 4. Implementation
119
 
120
- ### 4.1 Configuration Update (`src/utils/config.py`)
121
 
122
  ```python
123
  class Settings(BaseSettings):
@@ -134,7 +175,267 @@ class Settings(BaseSettings):
134
  return bool(self.modal_token_id and self.modal_token_secret)
135
  ```
136
 
137
- ### 4.2 Simple Orchestrator Update (`src/orchestrator.py`)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
  ```python
140
  """Main orchestrator with optional Modal analysis."""
@@ -160,29 +461,20 @@ class Orchestrator:
160
  self.history: list[dict[str, Any]] = []
161
  self._enable_analysis = enable_analysis and settings.modal_available
162
 
163
- # Lazy-load analysis components
164
- self._hypothesis_agent: Any = None
165
- self._analysis_agent: Any = None
166
 
167
- async def _get_hypothesis_agent(self) -> Any:
168
- """Lazy initialization of HypothesisAgent."""
169
- if self._hypothesis_agent is None:
170
- from src.agents.hypothesis_agent import HypothesisAgent
171
 
172
- self._hypothesis_agent = HypothesisAgent(
173
- evidence_store={"current": []},
174
- )
175
- return self._hypothesis_agent
 
176
 
177
- async def _get_analysis_agent(self) -> Any:
178
- """Lazy initialization of AnalysisAgent."""
179
- if self._analysis_agent is None:
180
- from src.agents.analysis_agent import AnalysisAgent
181
-
182
- self._analysis_agent = AnalysisAgent(
183
- evidence_store={"current": [], "hypotheses": []},
184
- )
185
- return self._analysis_agent
186
 
187
  async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
188
  """Main orchestration loop with optional Modal analysis."""
@@ -198,24 +490,19 @@ class Orchestrator:
198
  )
199
 
200
  try:
201
- # Generate hypotheses first
202
- hypothesis_agent = await self._get_hypothesis_agent()
203
- hypothesis_agent._evidence_store["current"] = all_evidence
204
-
205
- hypothesis_result = await hypothesis_agent.run(query)
206
- hypotheses = hypothesis_agent._evidence_store.get("hypotheses", [])
207
-
208
- # Run Modal analysis
209
- analysis_agent = await self._get_analysis_agent()
210
- analysis_agent._evidence_store["current"] = all_evidence
211
- analysis_agent._evidence_store["hypotheses"] = hypotheses
212
 
213
- analysis_result = await analysis_agent.run(query)
 
 
 
 
 
214
 
215
  yield AgentEvent(
216
  type="analysis_complete",
217
- message="Modal analysis complete",
218
- data=analysis_agent._evidence_store.get("analysis", {}),
219
  iteration=iteration,
220
  )
221
 
@@ -230,9 +517,159 @@ class Orchestrator:
230
  # Continue to synthesis...
231
  ```
232
 
233
- ### 4.3 MCP Tool for Modal Analysis (`src/mcp_tools.py`)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
235
- Add a new MCP tool for direct Modal analysis:
236
 
237
  ```python
238
  async def analyze_hypothesis(
@@ -253,175 +690,67 @@ async def analyze_hypothesis(
253
  Returns:
254
  Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics
255
  """
256
- from src.tools.code_execution import get_code_executor, CodeExecutionError
257
- from src.agent_factory.judges import get_model
258
- from pydantic_ai import Agent
259
-
260
- # Check Modal availability
261
  from src.utils.config import settings
 
 
262
  if not settings.modal_available:
263
  return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET."
264
 
265
- # Generate analysis code using LLM
266
- code_agent = Agent(
267
- model=get_model(),
268
- output_type=str,
269
- system_prompt="""Generate Python code to analyze drug repurposing evidence.
270
- Use pandas, numpy, scipy.stats. Output executable code only.
271
- Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE.
272
- Print key statistics and p-values.""",
273
- )
274
-
275
- prompt = f"""Analyze this hypothesis:
276
- Drug: {drug}
277
- Condition: {condition}
278
-
279
- Evidence:
280
- {evidence_summary}
281
 
282
- Generate statistical analysis code."""
 
 
 
 
 
283
 
284
- try:
285
- code_result = await code_agent.run(prompt)
286
- generated_code = code_result.output
287
 
288
- # Execute in Modal sandbox
289
- executor = get_code_executor()
290
- import asyncio
291
- loop = asyncio.get_running_loop()
292
- from functools import partial
293
- execution = await loop.run_in_executor(
294
- None, partial(executor.execute, generated_code, timeout=60)
295
- )
296
 
297
- if not execution["success"]:
298
- return f"## Analysis Failed\n\nError: {execution['error']}"
299
-
300
- # Format output
301
- return f"""## Statistical Analysis: {drug} for {condition}
302
 
303
  ### Execution Output
304
  ```
305
- {execution['stdout']}
306
  ```
307
 
308
  ### Generated Code
309
  ```python
310
- {generated_code}
311
  ```
312
 
313
  **Executed in Modal Sandbox** - Isolated, secure, reproducible.
314
  """
315
-
316
- except CodeExecutionError as e:
317
- return f"## Analysis Error\n\n{e}"
318
- except Exception as e:
319
- return f"## Unexpected Error\n\n{e}"
320
  ```
321
 
322
- ### 4.4 Demo Script (`examples/modal_demo/run_analysis.py`)
323
-
324
- ```python
325
- #!/usr/bin/env python3
326
- """Demo: Modal-powered statistical analysis of drug repurposing evidence.
327
-
328
- This script demonstrates:
329
- 1. Gathering evidence from PubMed
330
- 2. Generating analysis code with LLM
331
- 3. Executing in Modal sandbox
332
- 4. Returning statistical insights
333
-
334
- Usage:
335
- export OPENAI_API_KEY=...
336
- export MODAL_TOKEN_ID=...
337
- export MODAL_TOKEN_SECRET=...
338
- uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
339
- """
340
-
341
- import argparse
342
- import asyncio
343
- import os
344
- import sys
345
-
346
- from src.agents.analysis_agent import AnalysisAgent
347
- from src.agents.hypothesis_agent import HypothesisAgent
348
- from src.tools.pubmed import PubMedTool
349
- from src.utils.config import settings
350
 
351
-
352
- async def main() -> None:
353
- """Run the Modal analysis demo."""
354
- parser = argparse.ArgumentParser(description="Modal Analysis Demo")
355
- parser.add_argument("query", help="Research query (e.g., 'metformin alzheimer')")
356
- args = parser.parse_args()
357
-
358
- # Check credentials
359
- if not settings.modal_available:
360
- print("Error: Modal credentials not configured.")
361
- print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env")
362
- sys.exit(1)
363
-
364
- if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
365
- print("Error: No LLM API key found.")
366
- sys.exit(1)
367
-
368
- print(f"\n{'='*60}")
369
- print("DeepCritical Modal Analysis Demo")
370
- print(f"Query: {args.query}")
371
- print(f"{'='*60}\n")
372
-
373
- # Step 1: Gather Evidence
374
- print("Step 1: Gathering evidence from PubMed...")
375
- pubmed = PubMedTool()
376
- evidence = await pubmed.search(args.query, max_results=5)
377
- print(f" Found {len(evidence)} papers\n")
378
-
379
- # Step 2: Generate Hypotheses
380
- print("Step 2: Generating mechanistic hypotheses...")
381
- evidence_store: dict = {"current": evidence, "hypotheses": []}
382
- hypothesis_agent = HypothesisAgent(evidence_store=evidence_store)
383
- await hypothesis_agent.run(args.query)
384
- hypotheses = evidence_store.get("hypotheses", [])
385
- print(f" Generated {len(hypotheses)} hypotheses\n")
386
-
387
- if hypotheses:
388
- print(f" Primary: {hypotheses[0].drug} β†’ {hypotheses[0].target}")
389
-
390
- # Step 3: Run Modal Analysis
391
- print("\nStep 3: Running statistical analysis in Modal sandbox...")
392
- print(" (This executes LLM-generated code in an isolated container)\n")
393
-
394
- analysis_agent = AnalysisAgent(evidence_store=evidence_store)
395
- result = await analysis_agent.run(args.query)
396
-
397
- # Step 4: Display Results
398
- print("\n" + "="*60)
399
- print("ANALYSIS RESULTS")
400
- print("="*60)
401
-
402
- if result.messages:
403
- print(result.messages[0].text)
404
-
405
- analysis = evidence_store.get("analysis", {})
406
- if analysis:
407
- print(f"\nVerdict: {analysis.get('verdict', 'N/A')}")
408
- print(f"Confidence: {analysis.get('confidence', 0):.0%}")
409
-
410
- print("\n[Demo Complete - Code was executed in Modal, not locally]")
411
-
412
-
413
- if __name__ == "__main__":
414
- asyncio.run(main())
415
- ```
416
-
417
- ### 4.5 Verification Script (`examples/modal_demo/verify_sandbox.py`)
418
 
419
  ```python
420
  #!/usr/bin/env python3
421
  """Verify that Modal sandbox is properly isolated.
422
 
423
  This script proves to judges that code runs in Modal, not locally.
424
- It attempts operations that would succeed locally but fail in sandbox.
425
 
426
  Usage:
427
  uv run python examples/modal_demo/verify_sandbox.py
@@ -438,26 +767,23 @@ async def main() -> None:
438
  """Verify Modal sandbox isolation."""
439
  if not settings.modal_available:
440
  print("Error: Modal credentials not configured.")
 
441
  return
442
 
443
  executor = get_code_executor()
444
  loop = asyncio.get_running_loop()
445
 
446
- print("="*60)
447
  print("Modal Sandbox Isolation Verification")
448
- print("="*60 + "\n")
449
 
450
- # Test 1: Prove it's not running locally
451
  print("Test 1: Check hostname (should NOT be your machine)")
452
- code1 = """
453
- import socket
454
- print(f"Hostname: {socket.gethostname()}")
455
- """
456
  result1 = await loop.run_in_executor(None, partial(executor.execute, code1))
457
- print(f" Result: {result1['stdout'].strip()}")
458
- print(f" (Your local hostname would be different)\n")
459
 
460
- # Test 2: Verify scientific libraries available
461
  print("Test 2: Verify scientific libraries")
462
  code2 = """
463
  import pandas as pd
@@ -470,45 +796,108 @@ print(f"scipy: {scipy.__version__}")
470
  result2 = await loop.run_in_executor(None, partial(executor.execute, code2))
471
  print(f" {result2['stdout'].strip()}\n")
472
 
473
- # Test 3: Verify network is blocked (security)
474
- print("Test 3: Verify network isolation (should fail)")
475
  code3 = """
476
  import urllib.request
477
  try:
478
  urllib.request.urlopen("https://google.com", timeout=2)
479
- print("Network: ALLOWED (unexpected)")
480
- except Exception as e:
481
- print(f"Network: BLOCKED (as expected)")
482
  """
483
  result3 = await loop.run_in_executor(None, partial(executor.execute, code3))
484
  print(f" {result3['stdout'].strip()}\n")
485
 
486
- # Test 4: Run actual statistical analysis
487
- print("Test 4: Execute real statistical analysis")
488
  code4 = """
489
  import pandas as pd
490
  import scipy.stats as stats
491
 
492
- data = pd.DataFrame({
493
- 'drug': ['Metformin'] * 3,
494
- 'effect': [0.42, 0.38, 0.51],
495
- 'n': [100, 150, 80]
496
- })
497
-
498
- mean_effect = data['effect'].mean()
499
- sem = data['effect'].sem()
500
  t_stat, p_val = stats.ttest_1samp(data['effect'], 0)
501
 
502
- print(f"Mean Effect: {mean_effect:.3f} (SE: {sem:.3f})")
503
- print(f"t-statistic: {t_stat:.2f}, p-value: {p_val:.4f}")
504
  print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}")
505
  """
506
  result4 = await loop.run_in_executor(None, partial(executor.execute, code4))
507
  print(f" {result4['stdout'].strip()}\n")
508
 
509
- print("="*60)
510
  print("All tests complete - Modal sandbox verified!")
511
- print("="*60)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
512
 
513
 
514
  if __name__ == "__main__":
@@ -517,18 +906,23 @@ if __name__ == "__main__":
517
 
518
  ---
519
 
520
- ## 5. TDD Test Suite
521
 
522
- ### 5.1 Unit Tests (`tests/unit/tools/test_modal_integration.py`)
523
 
524
  ```python
525
- """Unit tests for Modal pipeline integration."""
526
 
527
  from unittest.mock import AsyncMock, MagicMock, patch
528
 
529
  import pytest
530
 
531
- from src.utils.models import Evidence, Citation
 
 
 
 
 
532
 
533
 
534
  @pytest.fixture
@@ -536,7 +930,7 @@ def sample_evidence() -> list[Evidence]:
536
  """Sample evidence for testing."""
537
  return [
538
  Evidence(
539
- content="Metformin shows effect size of 0.45 in Alzheimer's model.",
540
  citation=Citation(
541
  source="pubmed",
542
  title="Metformin Study",
@@ -549,128 +943,83 @@ def sample_evidence() -> list[Evidence]:
549
  ]
550
 
551
 
552
- class TestAnalysisAgentIntegration:
553
- """Tests for AnalysisAgent integration."""
 
 
 
 
 
 
 
 
 
554
 
555
  @pytest.mark.asyncio
556
- async def test_analysis_agent_generates_code(
557
  self, sample_evidence: list[Evidence]
558
  ) -> None:
559
- """AnalysisAgent should generate Python code for analysis."""
560
- from src.agents.analysis_agent import AnalysisAgent
561
-
562
- evidence_store = {
563
- "current": sample_evidence,
564
- "hypotheses": [
565
- MagicMock(
566
- drug="metformin",
567
- target="AMPK",
568
- pathway="autophagy",
569
- effect="neuroprotection",
570
- confidence=0.8,
571
- )
572
- ],
573
- }
574
 
575
- with patch("src.agents.analysis_agent.get_code_executor") as mock_executor, \
576
- patch("src.agents.analysis_agent.get_model") as mock_model:
577
 
578
- # Mock LLM to return code
579
- mock_agent = AsyncMock()
580
- mock_agent.run = AsyncMock(return_value=MagicMock(
581
- output="import pandas as pd\nresult = 'SUPPORTED'"
582
- ))
583
 
584
- # Mock Modal execution
585
  mock_executor.return_value.execute.return_value = {
586
- "stdout": "SUPPORTED",
587
  "stderr": "",
588
  "success": True,
589
- "error": None,
590
  }
591
 
592
- agent = AnalysisAgent(evidence_store=evidence_store)
593
- agent._agent = mock_agent
594
-
595
- result = await agent.run("metformin alzheimer")
596
-
597
- assert result.messages[0].text is not None
598
- assert "analysis" in evidence_store
599
-
600
 
601
- class TestModalExecutorUnit:
602
- """Unit tests for ModalCodeExecutor."""
603
 
604
- def test_executor_checks_credentials(self) -> None:
605
- """Executor should warn if credentials missing."""
606
- import os
607
- from unittest.mock import patch
 
608
 
609
- with patch.dict(os.environ, {}, clear=True):
610
- from src.tools.code_execution import ModalCodeExecutor
611
 
612
- # Should not raise, but should log warning
613
- executor = ModalCodeExecutor()
614
- assert executor.modal_token_id is None
615
 
616
- def test_get_sandbox_library_list(self) -> None:
617
- """Should return list of library==version strings."""
618
- from src.tools.code_execution import get_sandbox_library_list
619
-
620
- libs = get_sandbox_library_list()
621
-
622
- assert isinstance(libs, list)
623
- assert "pandas==2.2.0" in libs
624
- assert "numpy==1.26.4" in libs
625
-
626
-
627
- class TestOrchestratorWithAnalysis:
628
- """Tests for orchestrator with Modal analysis enabled."""
629
-
630
- @pytest.mark.asyncio
631
- async def test_orchestrator_calls_analysis_when_enabled(self) -> None:
632
- """Orchestrator should call AnalysisAgent when enabled and Modal available."""
633
- from src.orchestrator import Orchestrator
634
- from src.utils.models import OrchestratorConfig
635
-
636
- with patch("src.orchestrator.settings") as mock_settings:
637
- mock_settings.modal_available = True
638
-
639
- mock_search = AsyncMock()
640
- mock_search.search.return_value = MagicMock(
641
- evidence=[],
642
- errors=[],
643
- )
644
-
645
- mock_judge = AsyncMock()
646
- mock_judge.assess.return_value = MagicMock(
647
- sufficient=True,
648
- recommendation="synthesize",
649
- next_search_queries=[],
650
  )
651
-
652
- config = OrchestratorConfig(max_iterations=1)
653
- orchestrator = Orchestrator(
654
- search_handler=mock_search,
655
- judge_handler=mock_judge,
656
- config=config,
657
- enable_analysis=True,
 
 
 
 
658
  )
659
-
660
- # Collect events
661
- events = []
662
- async for event in orchestrator.run("test query"):
663
- events.append(event)
664
-
665
- # Should have analyzing event if Modal enabled
666
- event_types = [e.type for e in events]
667
- # Note: This test verifies the flow, actual Modal call is mocked
668
  ```
669
 
670
- ### 5.2 Integration Test (`tests/integration/test_modal.py`)
671
 
672
  ```python
673
- """Integration tests for Modal code execution (requires Modal credentials)."""
674
 
675
  import pytest
676
 
@@ -678,27 +1027,20 @@ from src.utils.config import settings
678
 
679
 
680
  @pytest.mark.integration
681
- @pytest.mark.skipif(
682
- not settings.modal_available,
683
- reason="Modal credentials not configured"
684
- )
685
  class TestModalIntegration:
686
- """Integration tests for Modal (requires credentials)."""
687
 
688
  @pytest.mark.asyncio
689
- async def test_modal_executes_real_code(self) -> None:
690
- """Test actual code execution in Modal sandbox."""
691
  import asyncio
692
  from functools import partial
693
 
694
  from src.tools.code_execution import get_code_executor
695
 
696
  executor = get_code_executor()
697
- code = """
698
- import pandas as pd
699
- result = pd.DataFrame({'a': [1,2,3]})['a'].sum()
700
- print(f"Sum: {result}")
701
- """
702
 
703
  loop = asyncio.get_running_loop()
704
  result = await loop.run_in_executor(
@@ -706,174 +1048,148 @@ print(f"Sum: {result}")
706
  )
707
 
708
  assert result["success"]
709
- assert "Sum: 6" in result["stdout"]
710
 
711
  @pytest.mark.asyncio
712
- async def test_modal_blocks_network(self) -> None:
713
- """Verify network is blocked in sandbox."""
714
- import asyncio
715
- from functools import partial
 
 
 
 
 
 
 
 
 
 
 
 
 
 
716
 
717
- from src.tools.code_execution import get_code_executor
 
718
 
719
- executor = get_code_executor()
720
- code = """
721
- import urllib.request
722
- try:
723
- urllib.request.urlopen("https://google.com", timeout=2)
724
- print("NETWORK_ALLOWED")
725
- except Exception:
726
- print("NETWORK_BLOCKED")
727
- """
728
-
729
- loop = asyncio.get_running_loop()
730
- result = await loop.run_in_executor(
731
- None, partial(executor.execute, code, timeout=30)
732
- )
733
-
734
- assert "NETWORK_BLOCKED" in result["stdout"]
735
  ```
736
 
737
  ---
738
 
739
- ## 6. Verification Commands
740
 
741
  ```bash
742
- # 1. Set Modal credentials
743
- export MODAL_TOKEN_ID=your-token-id
744
- export MODAL_TOKEN_SECRET=your-token-secret
745
-
746
- # Or via modal CLI
747
- modal setup
748
 
749
- # 2. Run unit tests
750
- uv run pytest tests/unit/tools/test_modal_integration.py -v
751
 
752
- # 3. Run verification script (proves sandbox works)
753
  uv run python examples/modal_demo/verify_sandbox.py
754
 
755
- # 4. Run full demo
756
  uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
757
 
758
- # 5. Run integration tests (requires Modal creds)
759
  uv run pytest tests/integration/test_modal.py -v -m integration
760
 
761
- # 6. Run full test suite
762
  make check
763
  ```
764
 
765
  ---
766
 
767
- ## 7. Definition of Done
768
 
769
  Phase 13 is **COMPLETE** when:
770
 
771
- - [ ] `src/utils/config.py` updated with `enable_modal_analysis` setting
772
- - [ ] `src/orchestrator.py` optionally calls `AnalysisAgent`
773
- - [ ] `src/mcp_tools.py` has `analyze_hypothesis` MCP tool
774
- - [ ] `examples/modal_demo/run_analysis.py` working demo
775
- - [ ] `examples/modal_demo/verify_sandbox.py` verification script
776
- - [ ] Unit tests in `tests/unit/tools/test_modal_integration.py`
777
- - [ ] Integration tests in `tests/integration/test_modal.py`
778
- - [ ] Verification script proves sandbox isolation
779
- - [ ] All unit tests pass
780
- - [ ] Lints pass
781
-
782
- ---
783
-
784
- ## 8. Demo Script for Judges
785
-
786
- ### Show Modal Innovation
787
-
788
- 1. **Run verification script** (proves sandbox):
789
- ```bash
790
- uv run python examples/modal_demo/verify_sandbox.py
791
- ```
792
- - Shows hostname is NOT local machine
793
- - Shows scientific libraries available
794
- - Shows network is BLOCKED (security)
795
- - Shows real statistics execution
796
-
797
- 2. **Run analysis demo**:
798
- ```bash
799
- uv run python examples/modal_demo/run_analysis.py "metformin cancer"
800
- ```
801
- - Shows evidence gathering
802
- - Shows hypothesis generation
803
- - Shows code execution in Modal
804
- - Shows statistical verdict
805
-
806
- 3. **Show the key differentiator**:
807
- > "LLM-generated code executes in an isolated Modal container. This is enterprise-grade safety for AI-powered scientific computing."
808
 
809
  ---
810
 
811
- ## 9. Value Delivered
812
-
813
- | Before | After |
814
- |--------|-------|
815
- | Code execution exists but unused | Integrated into pipeline |
816
- | No demo of sandbox isolation | Verification script proves it |
817
- | No MCP tool for analysis | `analyze_hypothesis` MCP tool |
818
- | No judge-friendly demo | Clear demo script |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
819
 
820
- **Prize Impact**:
821
- - With Modal Integration: **Eligible for $2,500 Modal Innovation Award**
822
 
823
  ---
824
 
825
- ## 10. Files to Create/Modify
826
 
827
  | File | Action | Purpose |
828
  |------|--------|---------|
 
829
  | `src/utils/config.py` | MODIFY | Add `enable_modal_analysis` |
830
- | `src/orchestrator.py` | MODIFY | Add optional AnalysisAgent call |
831
- | `src/mcp_tools.py` | MODIFY | Add `analyze_hypothesis` MCP tool |
 
 
832
  | `examples/modal_demo/run_analysis.py` | CREATE | Demo script |
833
- | `examples/modal_demo/verify_sandbox.py` | CREATE | Verification script |
834
- | `tests/unit/tools/test_modal_integration.py` | CREATE | Unit tests |
835
  | `tests/integration/test_modal.py` | CREATE | Integration tests |
836
 
837
- ---
838
-
839
- ## 11. Architecture After Phase 13
840
-
841
- ```
842
- User Query
843
- ↓
844
- Orchestrator
845
- ↓
846
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
847
- β”‚ Search Phase β”‚
848
- β”‚ PubMedTool β†’ ClinicalTrialsTool β†’ BioRxivTool β”‚
849
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
850
- ↓
851
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
852
- β”‚ Judge Phase β”‚
853
- β”‚ JudgeHandler β†’ "sufficient" β†’ continue to synthesis β”‚
854
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
855
- ↓ (if enable_modal_analysis=True)
856
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€οΏ½οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
857
- β”‚ Analysis Phase (NEW) β”‚
858
- β”‚ HypothesisAgent β†’ Generate mechanistic hypotheses β”‚
859
- β”‚ ↓ β”‚
860
- β”‚ AnalysisAgent β†’ Generate Python code β”‚
861
- β”‚ ↓ β”‚
862
- β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
863
- β”‚ β”‚ Modal Sandbox Container β”‚ β”‚
864
- β”‚ β”‚ - pandas, numpy, scipy, sklearn β”‚ β”‚
865
- β”‚ β”‚ - Network BLOCKED β”‚ β”‚
866
- β”‚ β”‚ - Filesystem ISOLATED β”‚ β”‚
867
- β”‚ β”‚ - Execute β†’ Return stdout β”‚ β”‚
868
- β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
869
- β”‚ ↓ β”‚
870
- β”‚ AnalysisResult β†’ SUPPORTED/REFUTED/INCONCLUSIVE β”‚
871
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
872
- ↓
873
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
874
- β”‚ Report Phase β”‚
875
- β”‚ ReportAgent β†’ Structured scientific report β”‚
876
- β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
877
- ```
878
-
879
- **This is the Modal-powered analytics stack.**
 
25
 
26
  ### What's Missing
27
 
28
+ ```text
29
  Current Flow:
30
  User Query β†’ Orchestrator β†’ Search β†’ Judge β†’ [Report] β†’ Done
31
 
32
  With Modal:
33
+ User Query β†’ Orchestrator β†’ Search β†’ Judge β†’ [Analysis*] β†’ Report β†’ Done
34
+ ↓
35
+ Modal Sandbox Execution
36
  ```
37
 
38
  *The AnalysisAgent exists but is NOT called by either orchestrator.
39
 
40
  ---
41
 
42
+ ## 2. Critical Dependency Analysis
43
+
44
+ ### The Problem (Senior Feedback)
45
+
46
+ ```python
47
+ # src/agents/analysis_agent.py - Line 8
48
+ from agent_framework import (
49
+ AgentRunResponse,
50
+ BaseAgent,
51
+ ...
52
+ )
53
+ ```
54
+
55
+ ```toml
56
+ # pyproject.toml - agent-framework is OPTIONAL
57
+ [project.optional-dependencies]
58
+ magentic = [
59
+ "agent-framework-core",
60
+ ]
61
+ ```
62
+
63
+ **If we import `AnalysisAgent` in the simple orchestrator without the `magentic` extra installed, the app CRASHES on startup.**
64
+
65
+ ### The SOLID Solution
66
+
67
+ **Single Responsibility Principle**: Decouple Modal execution logic from `agent_framework`.
68
+
69
+ ```text
70
+ BEFORE (Coupled):
71
+ AnalysisAgent (requires agent_framework)
72
+ ↓
73
+ ModalCodeExecutor
74
+
75
+ AFTER (Decoupled):
76
+ StatisticalAnalyzer (no agent_framework dependency) ← Simple mode uses this
77
+ ↓
78
+ ModalCodeExecutor
79
+ ↑
80
+ AnalysisAgent (wraps StatisticalAnalyzer) ← Magentic mode uses this
81
+ ```
82
+
83
+ **Key insight**: Create `src/services/statistical_analyzer.py` with ZERO agent_framework imports.
84
+
85
+ ---
86
+
87
+ ## 3. Prize Opportunity
88
 
89
  ### Modal Innovation Award: $2,500
90
 
 
102
  import pandas as pd
103
  import scipy.stats as stats
104
 
 
105
  data = pd.DataFrame({
106
  'study': ['Study1', 'Study2', 'Study3'],
107
  'effect_size': [0.45, 0.52, 0.38],
108
  'sample_size': [120, 85, 200]
109
  })
110
 
 
111
  weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum()
112
  t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0)
113
 
114
  print(f"Weighted Effect Size: {weighted_mean:.3f}")
115
  print(f"P-value: {p_value:.4f}")
116
 
117
+ result = "SUPPORTED" if p_value < 0.05 else "INCONCLUSIVE"
 
 
 
118
  """
119
 
120
  # Executed SAFELY in Modal sandbox
 
124
 
125
  ---
126
 
127
+ ## 4. Technical Specification
128
 
129
+ ### 4.1 Dependencies
130
 
131
  ```toml
132
+ # pyproject.toml - NO CHANGES to dependencies
133
+ # StatisticalAnalyzer uses only:
134
+ # - pydantic-ai (already in main deps)
135
+ # - modal (already in main deps)
136
+ # - src.tools.code_execution (no agent_framework)
137
  ```
138
 
139
+ ### 4.2 Environment Variables
140
 
141
  ```bash
142
  # .env
 
144
  MODAL_TOKEN_SECRET=your-token-secret
145
  ```
146
 
147
+ ### 4.3 Integration Points
148
 
149
  | Integration Point | File | Change Required |
150
  |-------------------|------|-----------------|
151
+ | New Service | `src/services/statistical_analyzer.py` | CREATE (no agent_framework) |
152
+ | Simple Orchestrator | `src/orchestrator.py` | Use `StatisticalAnalyzer` |
 
153
  | Config | `src/utils/config.py` | Add `enable_modal_analysis` setting |
154
+ | AnalysisAgent | `src/agents/analysis_agent.py` | Refactor to wrap `StatisticalAnalyzer` |
155
+ | MCP Tool | `src/mcp_tools.py` | Add `analyze_hypothesis` tool |
156
 
157
  ---
158
 
159
+ ## 5. Implementation
160
 
161
+ ### 5.1 Configuration Update (`src/utils/config.py`)
162
 
163
  ```python
164
  class Settings(BaseSettings):
 
175
  return bool(self.modal_token_id and self.modal_token_secret)
176
  ```
177
 
178
+ ### 5.2 StatisticalAnalyzer Service (`src/services/statistical_analyzer.py`)
179
+
180
+ **This is the key fix - NO agent_framework imports.**
181
+
182
+ ```python
183
+ """Statistical analysis service using Modal code execution.
184
+
185
+ This module provides Modal-based statistical analysis WITHOUT depending on
186
+ agent_framework. This allows it to be used in the simple orchestrator mode
187
+ without requiring the magentic optional dependency.
188
+
189
+ The AnalysisAgent (in src/agents/) wraps this service for magentic mode.
190
+ """
191
+
192
+ import asyncio
193
+ import re
194
+ from functools import partial
195
+ from typing import Any
196
+
197
+ from pydantic import BaseModel, Field
198
+ from pydantic_ai import Agent
199
+
200
+ from src.agent_factory.judges import get_model
201
+ from src.tools.code_execution import (
202
+ CodeExecutionError,
203
+ get_code_executor,
204
+ get_sandbox_library_prompt,
205
+ )
206
+ from src.utils.models import Evidence
207
+
208
+
209
+ class AnalysisResult(BaseModel):
210
+ """Result of statistical analysis."""
211
+
212
+ verdict: str = Field(
213
+ description="SUPPORTED, REFUTED, or INCONCLUSIVE",
214
+ )
215
+ confidence: float = Field(ge=0.0, le=1.0, description="Confidence in verdict (0-1)")
216
+ statistical_evidence: str = Field(
217
+ description="Summary of statistical findings from code execution"
218
+ )
219
+ code_generated: str = Field(description="Python code that was executed")
220
+ execution_output: str = Field(description="Output from code execution")
221
+ key_findings: list[str] = Field(default_factory=list, description="Key takeaways")
222
+ limitations: list[str] = Field(default_factory=list, description="Limitations")
223
+
224
+
225
+ class StatisticalAnalyzer:
226
+ """Performs statistical analysis using Modal code execution.
227
+
228
+ This service:
229
+ 1. Generates Python code for statistical analysis using LLM
230
+ 2. Executes code in Modal sandbox
231
+ 3. Interprets results
232
+ 4. Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE)
233
+
234
+ Note: This class has NO agent_framework dependency, making it safe
235
+ to use in the simple orchestrator without the magentic extra.
236
+ """
237
+
238
+ def __init__(self) -> None:
239
+ """Initialize the analyzer."""
240
+ self._code_executor: Any = None
241
+ self._agent: Agent[None, str] | None = None
242
+
243
+ def _get_code_executor(self) -> Any:
244
+ """Lazy initialization of code executor."""
245
+ if self._code_executor is None:
246
+ self._code_executor = get_code_executor()
247
+ return self._code_executor
248
+
249
+ def _get_agent(self) -> Agent[None, str]:
250
+ """Lazy initialization of LLM agent for code generation."""
251
+ if self._agent is None:
252
+ library_versions = get_sandbox_library_prompt()
253
+ self._agent = Agent(
254
+ model=get_model(),
255
+ output_type=str,
256
+ system_prompt=f"""You are a biomedical data scientist.
257
+
258
+ Generate Python code to analyze research evidence and test hypotheses.
259
+
260
+ Guidelines:
261
+ 1. Use pandas, numpy, scipy.stats for analysis
262
+ 2. Print clear, interpretable results
263
+ 3. Include statistical tests (t-tests, chi-square, etc.)
264
+ 4. Calculate effect sizes and confidence intervals
265
+ 5. Keep code concise (<50 lines)
266
+ 6. Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE
267
+
268
+ Available libraries:
269
+ {library_versions}
270
+
271
+ Output format: Return ONLY executable Python code, no explanations.""",
272
+ )
273
+ return self._agent
274
+
275
+ async def analyze(
276
+ self,
277
+ query: str,
278
+ evidence: list[Evidence],
279
+ hypothesis: dict[str, Any] | None = None,
280
+ ) -> AnalysisResult:
281
+ """Run statistical analysis on evidence.
282
+
283
+ Args:
284
+ query: The research question
285
+ evidence: List of Evidence objects to analyze
286
+ hypothesis: Optional hypothesis dict with drug, target, pathway, effect
287
+
288
+ Returns:
289
+ AnalysisResult with verdict and statistics
290
+ """
291
+ # Build analysis prompt
292
+ evidence_summary = self._summarize_evidence(evidence[:10])
293
+ hypothesis_text = ""
294
+ if hypothesis:
295
+ hypothesis_text = f"""
296
+ Hypothesis: {hypothesis.get('drug', 'Unknown')} β†’ {hypothesis.get('target', '?')} β†’ {hypothesis.get('pathway', '?')} β†’ {hypothesis.get('effect', '?')}
297
+ Confidence: {hypothesis.get('confidence', 0.5):.0%}
298
+ """
299
+
300
+ prompt = f"""Generate Python code to statistically analyze:
301
+
302
+ **Research Question**: {query}
303
+ {hypothesis_text}
304
+
305
+ **Evidence Summary**:
306
+ {evidence_summary}
307
+
308
+ Generate executable Python code to analyze this evidence."""
309
+
310
+ try:
311
+ # Generate code
312
+ agent = self._get_agent()
313
+ code_result = await agent.run(prompt)
314
+ generated_code = code_result.output
315
+
316
+ # Execute in Modal sandbox
317
+ loop = asyncio.get_running_loop()
318
+ executor = self._get_code_executor()
319
+ execution = await loop.run_in_executor(
320
+ None, partial(executor.execute, generated_code, timeout=120)
321
+ )
322
+
323
+ if not execution["success"]:
324
+ return AnalysisResult(
325
+ verdict="INCONCLUSIVE",
326
+ confidence=0.0,
327
+ statistical_evidence=f"Execution failed: {execution['error']}",
328
+ code_generated=generated_code,
329
+ execution_output=execution.get("stderr", ""),
330
+ key_findings=[],
331
+ limitations=["Code execution failed"],
332
+ )
333
+
334
+ # Interpret results
335
+ return self._interpret_results(generated_code, execution)
336
+
337
+ except CodeExecutionError as e:
338
+ return AnalysisResult(
339
+ verdict="INCONCLUSIVE",
340
+ confidence=0.0,
341
+ statistical_evidence=str(e),
342
+ code_generated="",
343
+ execution_output="",
344
+ key_findings=[],
345
+ limitations=[f"Analysis error: {e}"],
346
+ )
347
+
348
+ def _summarize_evidence(self, evidence: list[Evidence]) -> str:
349
+ """Summarize evidence for code generation prompt."""
350
+ if not evidence:
351
+ return "No evidence available."
352
+
353
+ lines = []
354
+ for i, ev in enumerate(evidence[:5], 1):
355
+ lines.append(f"{i}. {ev.content[:200]}...")
356
+ lines.append(f" Source: {ev.citation.title}")
357
+ lines.append(f" Relevance: {ev.relevance:.0%}\n")
358
+
359
+ return "\n".join(lines)
360
+
361
+ def _interpret_results(
362
+ self,
363
+ code: str,
364
+ execution: dict[str, Any],
365
+ ) -> AnalysisResult:
366
+ """Interpret code execution results."""
367
+ stdout = execution["stdout"]
368
+ stdout_upper = stdout.upper()
369
+
370
+ # Extract verdict with robust word-boundary matching
371
+ verdict = "INCONCLUSIVE"
372
+ if re.search(r"\bSUPPORTED\b", stdout_upper) and not re.search(
373
+ r"\b(?:NOT|UN)SUPPORTED\b", stdout_upper
374
+ ):
375
+ verdict = "SUPPORTED"
376
+ elif re.search(r"\bREFUTED\b", stdout_upper):
377
+ verdict = "REFUTED"
378
+
379
+ # Extract key findings
380
+ key_findings = []
381
+ for line in stdout.split("\n"):
382
+ line_lower = line.lower()
383
+ if any(kw in line_lower for kw in ["p-value", "significant", "effect", "mean"]):
384
+ key_findings.append(line.strip())
385
+
386
+ # Calculate confidence from p-values
387
+ confidence = self._calculate_confidence(stdout)
388
+
389
+ return AnalysisResult(
390
+ verdict=verdict,
391
+ confidence=confidence,
392
+ statistical_evidence=stdout.strip(),
393
+ code_generated=code,
394
+ execution_output=stdout,
395
+ key_findings=key_findings[:5],
396
+ limitations=[
397
+ "Analysis based on summary data only",
398
+ "Limited to available evidence",
399
+ "Statistical tests assume data independence",
400
+ ],
401
+ )
402
+
403
+ def _calculate_confidence(self, output: str) -> float:
404
+ """Calculate confidence based on statistical results."""
405
+ p_values = re.findall(r"p[-\s]?value[:\s]+(\d+\.?\d*)", output.lower())
406
+
407
+ if p_values:
408
+ try:
409
+ min_p = min(float(p) for p in p_values)
410
+ if min_p < 0.001:
411
+ return 0.95
412
+ elif min_p < 0.01:
413
+ return 0.90
414
+ elif min_p < 0.05:
415
+ return 0.80
416
+ else:
417
+ return 0.60
418
+ except ValueError:
419
+ pass
420
+
421
+ return 0.70 # Default
422
+
423
+
424
+ # Singleton for reuse
425
+ _analyzer: StatisticalAnalyzer | None = None
426
+
427
+
428
+ def get_statistical_analyzer() -> StatisticalAnalyzer:
429
+ """Get or create singleton StatisticalAnalyzer instance."""
430
+ global _analyzer
431
+ if _analyzer is None:
432
+ _analyzer = StatisticalAnalyzer()
433
+ return _analyzer
434
+ ```
435
+
436
+ ### 5.3 Simple Orchestrator Update (`src/orchestrator.py`)
437
+
438
+ **Uses `StatisticalAnalyzer` directly - NO agent_framework import.**
439
 
440
  ```python
441
  """Main orchestrator with optional Modal analysis."""
 
461
  self.history: list[dict[str, Any]] = []
462
  self._enable_analysis = enable_analysis and settings.modal_available
463
 
464
+ # Lazy-load analysis (NO agent_framework dependency!)
465
+ self._analyzer: Any = None
 
466
 
467
+ def _get_analyzer(self) -> Any:
468
+ """Lazy initialization of StatisticalAnalyzer.
 
 
469
 
470
+ Note: This imports from src.services, NOT src.agents,
471
+ so it works without the magentic optional dependency.
472
+ """
473
+ if self._analyzer is None:
474
+ from src.services.statistical_analyzer import get_statistical_analyzer
475
 
476
+ self._analyzer = get_statistical_analyzer()
477
+ return self._analyzer
 
 
 
 
 
 
 
478
 
479
  async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
480
  """Main orchestration loop with optional Modal analysis."""
 
490
  )
491
 
492
  try:
493
+ analyzer = self._get_analyzer()
 
 
 
 
 
 
 
 
 
 
494
 
495
+ # Run Modal analysis (no agent_framework needed!)
496
+ analysis_result = await analyzer.analyze(
497
+ query=query,
498
+ evidence=all_evidence,
499
+ hypothesis=None, # Could add hypothesis generation later
500
+ )
501
 
502
  yield AgentEvent(
503
  type="analysis_complete",
504
+ message=f"Analysis verdict: {analysis_result.verdict}",
505
+ data=analysis_result.model_dump(),
506
  iteration=iteration,
507
  )
508
 
 
517
  # Continue to synthesis...
518
  ```
519
 
520
+ ### 5.4 Refactor AnalysisAgent (`src/agents/analysis_agent.py`)
521
+
522
+ **Wrap `StatisticalAnalyzer` for magentic mode.**
523
+
524
+ ```python
525
+ """Analysis agent for statistical analysis using Modal code execution.
526
+
527
+ This agent wraps StatisticalAnalyzer for use in magentic multi-agent mode.
528
+ The core logic is in src/services/statistical_analyzer.py to avoid
529
+ coupling agent_framework to the simple orchestrator.
530
+ """
531
+
532
+ from collections.abc import AsyncIterable
533
+ from typing import TYPE_CHECKING, Any
534
+
535
+ from agent_framework import (
536
+ AgentRunResponse,
537
+ AgentRunResponseUpdate,
538
+ AgentThread,
539
+ BaseAgent,
540
+ ChatMessage,
541
+ Role,
542
+ )
543
+
544
+ from src.services.statistical_analyzer import (
545
+ AnalysisResult,
546
+ get_statistical_analyzer,
547
+ )
548
+ from src.utils.models import Evidence
549
+
550
+ if TYPE_CHECKING:
551
+ from src.services.embeddings import EmbeddingService
552
+
553
+
554
+ class AnalysisAgent(BaseAgent): # type: ignore[misc]
555
+ """Wraps StatisticalAnalyzer for magentic multi-agent mode."""
556
+
557
+ def __init__(
558
+ self,
559
+ evidence_store: dict[str, Any],
560
+ embedding_service: "EmbeddingService | None" = None,
561
+ ) -> None:
562
+ super().__init__(
563
+ name="AnalysisAgent",
564
+ description="Performs statistical analysis using Modal sandbox",
565
+ )
566
+ self._evidence_store = evidence_store
567
+ self._embeddings = embedding_service
568
+ self._analyzer = get_statistical_analyzer()
569
+
570
+ async def run(
571
+ self,
572
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
573
+ *,
574
+ thread: AgentThread | None = None,
575
+ **kwargs: Any,
576
+ ) -> AgentRunResponse:
577
+ """Analyze evidence and return verdict."""
578
+ query = self._extract_query(messages)
579
+ hypotheses = self._evidence_store.get("hypotheses", [])
580
+ evidence = self._evidence_store.get("current", [])
581
+
582
+ if not evidence:
583
+ return self._error_response("No evidence available.")
584
+
585
+ # Get primary hypothesis if available
586
+ hypothesis_dict = None
587
+ if hypotheses:
588
+ h = hypotheses[0]
589
+ hypothesis_dict = {
590
+ "drug": getattr(h, "drug", "Unknown"),
591
+ "target": getattr(h, "target", "?"),
592
+ "pathway": getattr(h, "pathway", "?"),
593
+ "effect": getattr(h, "effect", "?"),
594
+ "confidence": getattr(h, "confidence", 0.5),
595
+ }
596
+
597
+ # Delegate to StatisticalAnalyzer
598
+ result = await self._analyzer.analyze(
599
+ query=query,
600
+ evidence=evidence,
601
+ hypothesis=hypothesis_dict,
602
+ )
603
+
604
+ # Store in shared context
605
+ self._evidence_store["analysis"] = result.model_dump()
606
+
607
+ # Format response
608
+ response_text = self._format_response(result)
609
+
610
+ return AgentRunResponse(
611
+ messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
612
+ response_id=f"analysis-{result.verdict.lower()}",
613
+ additional_properties={"analysis": result.model_dump()},
614
+ )
615
+
616
+ def _format_response(self, result: AnalysisResult) -> str:
617
+ """Format analysis result as markdown."""
618
+ lines = [
619
+ "## Statistical Analysis Complete\n",
620
+ f"### Verdict: **{result.verdict}**",
621
+ f"**Confidence**: {result.confidence:.0%}\n",
622
+ "### Key Findings",
623
+ ]
624
+ for finding in result.key_findings:
625
+ lines.append(f"- {finding}")
626
+
627
+ lines.extend([
628
+ "\n### Statistical Evidence",
629
+ "```",
630
+ result.statistical_evidence,
631
+ "```",
632
+ ])
633
+ return "\n".join(lines)
634
+
635
+ def _error_response(self, message: str) -> AgentRunResponse:
636
+ """Create error response."""
637
+ return AgentRunResponse(
638
+ messages=[ChatMessage(role=Role.ASSISTANT, text=f"**Error**: {message}")],
639
+ response_id="analysis-error",
640
+ )
641
+
642
+ def _extract_query(
643
+ self, messages: str | ChatMessage | list[str] | list[ChatMessage] | None
644
+ ) -> str:
645
+ """Extract query from messages."""
646
+ if isinstance(messages, str):
647
+ return messages
648
+ elif isinstance(messages, ChatMessage):
649
+ return messages.text or ""
650
+ elif isinstance(messages, list):
651
+ for msg in reversed(messages):
652
+ if isinstance(msg, ChatMessage) and msg.role == Role.USER:
653
+ return msg.text or ""
654
+ elif isinstance(msg, str):
655
+ return msg
656
+ return ""
657
+
658
+ async def run_stream(
659
+ self,
660
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
661
+ *,
662
+ thread: AgentThread | None = None,
663
+ **kwargs: Any,
664
+ ) -> AsyncIterable[AgentRunResponseUpdate]:
665
+ """Streaming wrapper."""
666
+ result = await self.run(messages, thread=thread, **kwargs)
667
+ yield AgentRunResponseUpdate(messages=result.messages, response_id=result.response_id)
668
+ ```
669
+
670
+ ### 5.5 MCP Tool for Modal Analysis (`src/mcp_tools.py`)
671
 
672
+ Add to existing MCP tools:
673
 
674
  ```python
675
  async def analyze_hypothesis(
 
690
  Returns:
691
  Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics
692
  """
693
+ from src.services.statistical_analyzer import get_statistical_analyzer
 
 
 
 
694
  from src.utils.config import settings
695
+ from src.utils.models import Citation, Evidence
696
+
697
  if not settings.modal_available:
698
  return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET."
699
 
700
+ # Create evidence from summary
701
+ evidence = [
702
+ Evidence(
703
+ content=evidence_summary,
704
+ citation=Citation(
705
+ source="pubmed",
706
+ title=f"Evidence for {drug} in {condition}",
707
+ url="https://example.com",
708
+ date="2024-01-01",
709
+ authors=["User Provided"],
710
+ ),
711
+ relevance=0.9,
712
+ )
713
+ ]
 
 
714
 
715
+ analyzer = get_statistical_analyzer()
716
+ result = await analyzer.analyze(
717
+ query=f"Can {drug} treat {condition}?",
718
+ evidence=evidence,
719
+ hypothesis={"drug": drug, "target": "unknown", "pathway": "unknown", "effect": condition},
720
+ )
721
 
722
+ return f"""## Statistical Analysis: {drug} for {condition}
 
 
723
 
724
+ ### Verdict: **{result.verdict}**
725
+ **Confidence**: {result.confidence:.0%}
 
 
 
 
 
 
726
 
727
+ ### Key Findings
728
+ {chr(10).join(f"- {f}" for f in result.key_findings) or "- No specific findings extracted"}
 
 
 
729
 
730
  ### Execution Output
731
  ```
732
+ {result.execution_output}
733
  ```
734
 
735
  ### Generated Code
736
  ```python
737
+ {result.code_generated}
738
  ```
739
 
740
  **Executed in Modal Sandbox** - Isolated, secure, reproducible.
741
  """
 
 
 
 
 
742
  ```
743
 
744
+ ### 5.6 Demo Scripts
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
745
 
746
+ #### `examples/modal_demo/verify_sandbox.py`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
747
 
748
  ```python
749
  #!/usr/bin/env python3
750
  """Verify that Modal sandbox is properly isolated.
751
 
752
  This script proves to judges that code runs in Modal, not locally.
753
+ NO agent_framework dependency - uses only src.tools.code_execution.
754
 
755
  Usage:
756
  uv run python examples/modal_demo/verify_sandbox.py
 
767
  """Verify Modal sandbox isolation."""
768
  if not settings.modal_available:
769
  print("Error: Modal credentials not configured.")
770
+ print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env")
771
  return
772
 
773
  executor = get_code_executor()
774
  loop = asyncio.get_running_loop()
775
 
776
+ print("=" * 60)
777
  print("Modal Sandbox Isolation Verification")
778
+ print("=" * 60 + "\n")
779
 
780
+ # Test 1: Hostname
781
  print("Test 1: Check hostname (should NOT be your machine)")
782
+ code1 = "import socket; print(f'Hostname: {socket.gethostname()}')"
 
 
 
783
  result1 = await loop.run_in_executor(None, partial(executor.execute, code1))
784
+ print(f" {result1['stdout'].strip()}\n")
 
785
 
786
+ # Test 2: Scientific libraries
787
  print("Test 2: Verify scientific libraries")
788
  code2 = """
789
  import pandas as pd
 
796
  result2 = await loop.run_in_executor(None, partial(executor.execute, code2))
797
  print(f" {result2['stdout'].strip()}\n")
798
 
799
+ # Test 3: Network blocked
800
+ print("Test 3: Verify network isolation")
801
  code3 = """
802
  import urllib.request
803
  try:
804
  urllib.request.urlopen("https://google.com", timeout=2)
805
+ print("Network: ALLOWED (unexpected!)")
806
+ except Exception:
807
+ print("Network: BLOCKED (as expected)")
808
  """
809
  result3 = await loop.run_in_executor(None, partial(executor.execute, code3))
810
  print(f" {result3['stdout'].strip()}\n")
811
 
812
+ # Test 4: Real statistics
813
+ print("Test 4: Execute statistical analysis")
814
  code4 = """
815
  import pandas as pd
816
  import scipy.stats as stats
817
 
818
+ data = pd.DataFrame({'effect': [0.42, 0.38, 0.51]})
819
+ mean = data['effect'].mean()
 
 
 
 
 
 
820
  t_stat, p_val = stats.ttest_1samp(data['effect'], 0)
821
 
822
+ print(f"Mean Effect: {mean:.3f}")
823
+ print(f"P-value: {p_val:.4f}")
824
  print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}")
825
  """
826
  result4 = await loop.run_in_executor(None, partial(executor.execute, code4))
827
  print(f" {result4['stdout'].strip()}\n")
828
 
829
+ print("=" * 60)
830
  print("All tests complete - Modal sandbox verified!")
831
+ print("=" * 60)
832
+
833
+
834
+ if __name__ == "__main__":
835
+ asyncio.run(main())
836
+ ```
837
+
838
+ #### `examples/modal_demo/run_analysis.py`
839
+
840
+ ```python
841
+ #!/usr/bin/env python3
842
+ """Demo: Modal-powered statistical analysis.
843
+
844
+ This script uses StatisticalAnalyzer directly (NO agent_framework dependency).
845
+
846
+ Usage:
847
+ uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
848
+ """
849
+
850
+ import argparse
851
+ import asyncio
852
+ import os
853
+ import sys
854
+
855
+ from src.services.statistical_analyzer import get_statistical_analyzer
856
+ from src.tools.pubmed import PubMedTool
857
+ from src.utils.config import settings
858
+
859
+
860
+ async def main() -> None:
861
+ """Run the Modal analysis demo."""
862
+ parser = argparse.ArgumentParser(description="Modal Analysis Demo")
863
+ parser.add_argument("query", help="Research query")
864
+ args = parser.parse_args()
865
+
866
+ if not settings.modal_available:
867
+ print("Error: Modal credentials not configured.")
868
+ sys.exit(1)
869
+
870
+ if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
871
+ print("Error: No LLM API key found.")
872
+ sys.exit(1)
873
+
874
+ print(f"\n{'=' * 60}")
875
+ print("DeepCritical Modal Analysis Demo")
876
+ print(f"Query: {args.query}")
877
+ print(f"{'=' * 60}\n")
878
+
879
+ # Step 1: Gather Evidence
880
+ print("Step 1: Gathering evidence from PubMed...")
881
+ pubmed = PubMedTool()
882
+ evidence = await pubmed.search(args.query, max_results=5)
883
+ print(f" Found {len(evidence)} papers\n")
884
+
885
+ # Step 2: Run Modal Analysis
886
+ print("Step 2: Running statistical analysis in Modal sandbox...")
887
+ analyzer = get_statistical_analyzer()
888
+ result = await analyzer.analyze(query=args.query, evidence=evidence)
889
+
890
+ # Step 3: Display Results
891
+ print("\n" + "=" * 60)
892
+ print("ANALYSIS RESULTS")
893
+ print("=" * 60)
894
+ print(f"\nVerdict: {result.verdict}")
895
+ print(f"Confidence: {result.confidence:.0%}")
896
+ print("\nKey Findings:")
897
+ for finding in result.key_findings:
898
+ print(f" - {finding}")
899
+
900
+ print("\n[Demo Complete - Code executed in Modal, not locally]")
901
 
902
 
903
  if __name__ == "__main__":
 
906
 
907
  ---
908
 
909
+ ## 6. TDD Test Suite
910
 
911
+ ### 6.1 Unit Tests (`tests/unit/services/test_statistical_analyzer.py`)
912
 
913
  ```python
914
+ """Unit tests for StatisticalAnalyzer service."""
915
 
916
  from unittest.mock import AsyncMock, MagicMock, patch
917
 
918
  import pytest
919
 
920
+ from src.services.statistical_analyzer import (
921
+ AnalysisResult,
922
+ StatisticalAnalyzer,
923
+ get_statistical_analyzer,
924
+ )
925
+ from src.utils.models import Citation, Evidence
926
 
927
 
928
  @pytest.fixture
 
930
  """Sample evidence for testing."""
931
  return [
932
  Evidence(
933
+ content="Metformin shows effect size of 0.45.",
934
  citation=Citation(
935
  source="pubmed",
936
  title="Metformin Study",
 
943
  ]
944
 
945
 
946
+ class TestStatisticalAnalyzer:
947
+ """Tests for StatisticalAnalyzer (no agent_framework dependency)."""
948
+
949
+ def test_no_agent_framework_import(self) -> None:
950
+ """StatisticalAnalyzer must NOT import agent_framework."""
951
+ import src.services.statistical_analyzer as module
952
+
953
+ # Check module doesn't import agent_framework
954
+ source = open(module.__file__).read()
955
+ assert "agent_framework" not in source
956
+ assert "BaseAgent" not in source
957
 
958
  @pytest.mark.asyncio
959
+ async def test_analyze_returns_result(
960
  self, sample_evidence: list[Evidence]
961
  ) -> None:
962
+ """analyze() should return AnalysisResult."""
963
+ analyzer = StatisticalAnalyzer()
 
 
 
 
 
 
 
 
 
 
 
 
 
964
 
965
+ with patch.object(analyzer, "_get_agent") as mock_agent, \
966
+ patch.object(analyzer, "_get_code_executor") as mock_executor:
967
 
968
+ # Mock LLM
969
+ mock_agent.return_value.run = AsyncMock(
970
+ return_value=MagicMock(output="print('SUPPORTED')")
971
+ )
 
972
 
973
+ # Mock Modal
974
  mock_executor.return_value.execute.return_value = {
975
+ "stdout": "SUPPORTED\np-value: 0.01",
976
  "stderr": "",
977
  "success": True,
 
978
  }
979
 
980
+ result = await analyzer.analyze("test query", sample_evidence)
 
 
 
 
 
 
 
981
 
982
+ assert isinstance(result, AnalysisResult)
983
+ assert result.verdict == "SUPPORTED"
984
 
985
+ def test_singleton(self) -> None:
986
+ """get_statistical_analyzer should return singleton."""
987
+ a1 = get_statistical_analyzer()
988
+ a2 = get_statistical_analyzer()
989
+ assert a1 is a2
990
 
 
 
991
 
992
+ class TestAnalysisResult:
993
+ """Tests for AnalysisResult model."""
 
994
 
995
+ def test_verdict_values(self) -> None:
996
+ """Verdict should be one of the expected values."""
997
+ for verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]:
998
+ result = AnalysisResult(
999
+ verdict=verdict,
1000
+ confidence=0.8,
1001
+ statistical_evidence="test",
1002
+ code_generated="print('test')",
1003
+ execution_output="test",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1004
  )
1005
+ assert result.verdict == verdict
1006
+
1007
+ def test_confidence_bounds(self) -> None:
1008
+ """Confidence must be 0.0-1.0."""
1009
+ with pytest.raises(ValueError):
1010
+ AnalysisResult(
1011
+ verdict="SUPPORTED",
1012
+ confidence=1.5, # Invalid
1013
+ statistical_evidence="test",
1014
+ code_generated="test",
1015
+ execution_output="test",
1016
  )
 
 
 
 
 
 
 
 
 
1017
  ```
1018
 
1019
+ ### 6.2 Integration Test (`tests/integration/test_modal.py`)
1020
 
1021
  ```python
1022
+ """Integration tests for Modal (requires credentials)."""
1023
 
1024
  import pytest
1025
 
 
1027
 
1028
 
1029
  @pytest.mark.integration
1030
+ @pytest.mark.skipif(not settings.modal_available, reason="Modal not configured")
 
 
 
1031
  class TestModalIntegration:
1032
+ """Integration tests requiring Modal credentials."""
1033
 
1034
  @pytest.mark.asyncio
1035
+ async def test_sandbox_executes_code(self) -> None:
1036
+ """Modal sandbox should execute Python code."""
1037
  import asyncio
1038
  from functools import partial
1039
 
1040
  from src.tools.code_execution import get_code_executor
1041
 
1042
  executor = get_code_executor()
1043
+ code = "import pandas as pd; print(pd.DataFrame({'a': [1,2,3]})['a'].sum())"
 
 
 
 
1044
 
1045
  loop = asyncio.get_running_loop()
1046
  result = await loop.run_in_executor(
 
1048
  )
1049
 
1050
  assert result["success"]
1051
+ assert "6" in result["stdout"]
1052
 
1053
  @pytest.mark.asyncio
1054
+ async def test_statistical_analyzer_works(self) -> None:
1055
+ """StatisticalAnalyzer should work end-to-end."""
1056
+ from src.services.statistical_analyzer import get_statistical_analyzer
1057
+ from src.utils.models import Citation, Evidence
1058
+
1059
+ evidence = [
1060
+ Evidence(
1061
+ content="Drug shows 40% improvement in trial.",
1062
+ citation=Citation(
1063
+ source="pubmed",
1064
+ title="Test",
1065
+ url="https://test.com",
1066
+ date="2024-01-01",
1067
+ authors=["Test"],
1068
+ ),
1069
+ relevance=0.9,
1070
+ )
1071
+ ]
1072
 
1073
+ analyzer = get_statistical_analyzer()
1074
+ result = await analyzer.analyze("test drug efficacy", evidence)
1075
 
1076
+ assert result.verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]
1077
+ assert 0.0 <= result.confidence <= 1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1078
  ```
1079
 
1080
  ---
1081
 
1082
+ ## 7. Verification Commands
1083
 
1084
  ```bash
1085
+ # 1. Verify NO agent_framework in StatisticalAnalyzer
1086
+ grep -r "agent_framework" src/services/statistical_analyzer.py
1087
+ # Should return nothing!
 
 
 
1088
 
1089
+ # 2. Run unit tests (no Modal needed)
1090
+ uv run pytest tests/unit/services/test_statistical_analyzer.py -v
1091
 
1092
+ # 3. Run verification script (requires Modal)
1093
  uv run python examples/modal_demo/verify_sandbox.py
1094
 
1095
+ # 4. Run analysis demo (requires Modal + LLM)
1096
  uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
1097
 
1098
+ # 5. Run integration tests
1099
  uv run pytest tests/integration/test_modal.py -v -m integration
1100
 
1101
+ # 6. Full test suite
1102
  make check
1103
  ```
1104
 
1105
  ---
1106
 
1107
+ ## 8. Definition of Done
1108
 
1109
  Phase 13 is **COMPLETE** when:
1110
 
1111
+ - [ ] `src/services/statistical_analyzer.py` created (NO agent_framework)
1112
+ - [ ] `src/utils/config.py` has `enable_modal_analysis` setting
1113
+ - [ ] `src/orchestrator.py` uses `StatisticalAnalyzer` directly
1114
+ - [ ] `src/agents/analysis_agent.py` refactored to wrap `StatisticalAnalyzer`
1115
+ - [ ] `src/mcp_tools.py` has `analyze_hypothesis` tool
1116
+ - [ ] `examples/modal_demo/verify_sandbox.py` working
1117
+ - [ ] `examples/modal_demo/run_analysis.py` working
1118
+ - [ ] Unit tests pass WITHOUT magentic extra installed
1119
+ - [ ] Integration tests pass WITH Modal credentials
1120
+ - [ ] All lints pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1121
 
1122
  ---
1123
 
1124
+ ## 9. Architecture After Phase 13
1125
+
1126
+ ```text
1127
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1128
+ β”‚ MCP Clients β”‚
1129
+ β”‚ (Claude Desktop, Cursor, etc.) β”‚
1130
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1131
+ β”‚ MCP Protocol
1132
+ β–Ό
1133
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1134
+ β”‚ Gradio App + MCP Server β”‚
1135
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
1136
+ β”‚ β”‚ MCP Tools: search_pubmed, search_trials, search_biorxiv β”‚ β”‚
1137
+ β”‚ β”‚ search_all, analyze_hypothesis β”‚ β”‚
1138
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
1139
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1140
+ β”‚
1141
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1142
+ β”‚ β”‚
1143
+ β–Ό β–Ό
1144
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1145
+ β”‚ Simple Orchestrator β”‚ β”‚ Magentic Orchestrator β”‚
1146
+ β”‚ (no agent_framework) β”‚ β”‚ (with agent_framework) β”‚
1147
+ β”‚ β”‚ β”‚ β”‚
1148
+ β”‚ SearchHandler β”‚ β”‚ SearchAgent β”‚
1149
+ β”‚ JudgeHandler β”‚ β”‚ JudgeAgent β”‚
1150
+ β”‚ StatisticalAnalyzer ─┼────────────┼→ AnalysisAgent ────────────
1151
+ β”‚ β”‚ β”‚ (wraps StatisticalAnalyzer)
1152
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1153
+ β”‚
1154
+ β–Ό
1155
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1156
+ β”‚ StatisticalAnalyzer β”‚
1157
+ β”‚ (src/services/statistical_analyzer.py) β”‚
1158
+ β”‚ NO agent_framework dependency β”‚
1159
+ β”‚ β”‚
1160
+ β”‚ 1. Generate code with pydantic-ai β”‚
1161
+ β”‚ 2. Execute in Modal sandbox β”‚
1162
+ β”‚ 3. Return AnalysisResult β”‚
1163
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1164
+ β”‚
1165
+ β–Ό
1166
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1167
+ β”‚ Modal Sandbox β”‚
1168
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
1169
+ β”‚ β”‚ - pandas, numpy, scipy, sklearn, statsmodels β”‚ β”‚
1170
+ β”‚ β”‚ - Network: BLOCKED β”‚ β”‚
1171
+ β”‚ β”‚ - Filesystem: ISOLATED β”‚ β”‚
1172
+ β”‚ β”‚ - Timeout: ENFORCED β”‚ β”‚
1173
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
1174
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1175
+ ```
1176
 
1177
+ **This is the dependency-safe Modal stack.**
 
1178
 
1179
  ---
1180
 
1181
+ ## 10. Files Summary
1182
 
1183
  | File | Action | Purpose |
1184
  |------|--------|---------|
1185
+ | `src/services/statistical_analyzer.py` | **CREATE** | Core analysis (no agent_framework) |
1186
  | `src/utils/config.py` | MODIFY | Add `enable_modal_analysis` |
1187
+ | `src/orchestrator.py` | MODIFY | Use `StatisticalAnalyzer` |
1188
+ | `src/agents/analysis_agent.py` | MODIFY | Wrap `StatisticalAnalyzer` |
1189
+ | `src/mcp_tools.py` | MODIFY | Add `analyze_hypothesis` |
1190
+ | `examples/modal_demo/verify_sandbox.py` | CREATE | Sandbox verification |
1191
  | `examples/modal_demo/run_analysis.py` | CREATE | Demo script |
1192
+ | `tests/unit/services/test_statistical_analyzer.py` | CREATE | Unit tests |
 
1193
  | `tests/integration/test_modal.py` | CREATE | Integration tests |
1194
 
1195
+ **Key Fix**: `StatisticalAnalyzer` has ZERO agent_framework imports, making it safe for the simple orchestrator.