VibecoderMcSwaggins commited on
Commit
24a5878
Β·
2 Parent(s): ecbc47b ec3d7dc

Merge main: Phase 5 + Phase 6-8 doc revisions

Browse files
docs/architecture/overview.md CHANGED
@@ -63,53 +63,58 @@ Using existing approved drugs to treat NEW diseases they weren't originally desi
63
 
64
  ## System Architecture
65
 
66
- ### High-Level Design
67
 
68
  ```
69
- User Question
70
  ↓
71
- Research Agent (Orchestrator)
72
  ↓
73
- Search Loop:
74
- 1. Query Tools (PubMed, Web, Clinical Trials)
75
- 2. Gather Evidence
76
- 3. Judge Quality ("Do we have enough?")
77
- 4. If NO β†’ Refine query, search more
78
- 5. If YES β†’ Synthesize findings
79
  ↓
80
- Research Report with Citations
81
  ```
82
 
83
  ### Key Components
84
 
85
- 1. **Research Agent (Orchestrator)**
86
- - Manages the research process
87
- - Plans search strategies
88
- - Coordinates tools
89
- - Tracks token budget and iterations
90
-
91
- 2. **Tools**
92
- - PubMed Search (biomedical papers)
93
- - Web Search (general medical info)
94
- - Clinical Trials Database
95
- - Drug Information APIs
96
- - (Future: Protein databases, pathways)
97
-
98
- 3. **Judge System**
99
- - LLM-based quality assessment
100
- - Evaluates: "Do we have enough evidence?"
101
- - Criteria: Coverage, reliability, citation quality
102
-
103
- 4. **Break Conditions**
104
- - Token budget cap (cost control)
105
- - Max iterations (time control)
106
- - Judge says "sufficient evidence" (quality control)
107
-
108
- 5. **Gradio UI**
109
- - Simple text input for questions
110
- - Real-time progress display
111
- - Formatted research report output
112
- - Source citations and links
 
 
 
 
 
 
113
 
114
  ---
115
 
@@ -275,37 +280,31 @@ httpx = "^0.27"
275
 
276
  ## Success Criteria
277
 
278
- ### Minimum Viable Product (MVP) - Days 1-3
279
- **MUST HAVE for working demo:**
280
  - [x] User can ask drug repurposing question
281
- - [ ] Agent searches PubMed (async)
282
- - [ ] Agent searches web (Brave/DuckDuckGo)
283
- - [ ] LLM judge evaluates evidence quality
284
- - [ ] System respects token budget (50K tokens max)
285
- - [ ] Output includes drug candidates + citations
286
- - [ ] Works end-to-end for demo query: "Long COVID fatigue"
287
- - [ ] Gradio UI with streaming progress
288
-
289
- ### Hackathon Submission - Days 4-5
290
- **Required for all tracks:**
291
- - [ ] Gradio UI deployed on HuggingFace Spaces
292
- - [ ] 3 example queries working and tested
293
- - [ ] This architecture documentation
294
- - [ ] Demo video (2-3 min) showing workflow
295
- - [ ] README with setup instructions
296
-
297
- **Track-Specific:**
298
- - [ ] **Gradio Track**: Streaming UI, progress indicators, modern design
299
- - [ ] **MCP Track**: PubMed tool as MCP server (reusable by others)
300
- - [ ] **Modal Track**: GPU inference option (stretch)
301
-
302
- ### Stretch Goals - Day 6+
303
- **Nice-to-have if time permits:**
304
- - [ ] Modal integration for local LLM fallback
305
- - [ ] Clinical trials database search
306
- - [ ] Checkpoint/resume functionality
307
- - [ ] OpenFDA drug safety lookup
308
- - [ ] PDF export of research reports
309
 
310
  ### What's EXPLICITLY Out of Scope
311
  **NOT building (to stay focused):**
 
63
 
64
  ## System Architecture
65
 
66
+ ### High-Level Design (Phases 1-8)
67
 
68
  ```
69
+ User Query
70
  ↓
71
+ Gradio UI (Phase 4)
72
  ↓
73
+ Magentic Manager (Phase 5) ← LLM-powered coordinator
74
+ β”œβ”€β”€ SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
75
+ β”œβ”€β”€ HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
76
+ β”œβ”€β”€ JudgeAgent (Phase 3+5) ←→ Evidence Assessment
77
+ └── ReportAgent (Phase 8) ←→ Final Synthesis
 
78
  ↓
79
+ Structured Research Report
80
  ```
81
 
82
  ### Key Components
83
 
84
+ 1. **Magentic Manager (Orchestrator)**
85
+ - LLM-powered multi-agent coordinator
86
+ - Dynamic planning and agent selection
87
+ - Built-in stall detection and replanning
88
+ - Microsoft Agent Framework integration
89
+
90
+ 2. **SearchAgent (Phase 2+5+6)**
91
+ - PubMed E-utilities search
92
+ - DuckDuckGo web search
93
+ - Semantic search via ChromaDB (Phase 6)
94
+ - Evidence deduplication
95
+
96
+ 3. **HypothesisAgent (Phase 7)**
97
+ - Generates Drug β†’ Target β†’ Pathway β†’ Effect hypotheses
98
+ - Guides targeted searches
99
+ - Scientific reasoning about mechanisms
100
+
101
+ 4. **JudgeAgent (Phase 3+5)**
102
+ - LLM-based evidence assessment
103
+ - Mechanism score + Clinical score
104
+ - Recommends continue/synthesize
105
+ - Generates refined search queries
106
+
107
+ 5. **ReportAgent (Phase 8)**
108
+ - Structured scientific reports
109
+ - Executive summary, methodology
110
+ - Hypotheses tested with evidence counts
111
+ - Proper citations and limitations
112
+
113
+ 6. **Gradio UI (Phase 4)**
114
+ - Chat interface for questions
115
+ - Real-time progress via events
116
+ - Mode toggle (Simple/Magentic)
117
+ - Formatted markdown output
118
 
119
  ---
120
 
 
280
 
281
  ## Success Criteria
282
 
283
+ ### Phase 1-5 (MVP) βœ… COMPLETE
284
+ **Completed in ONE DAY:**
285
  - [x] User can ask drug repurposing question
286
+ - [x] Agent searches PubMed (async)
287
+ - [x] Agent searches web (DuckDuckGo)
288
+ - [x] LLM judge evaluates evidence quality
289
+ - [x] System respects token budget and iterations
290
+ - [x] Output includes drug candidates + citations
291
+ - [x] Works end-to-end for demo query
292
+ - [x] Gradio UI with streaming progress
293
+ - [x] Magentic multi-agent orchestration
294
+ - [x] 38 unit tests passing
295
+ - [x] CI/CD pipeline green
296
+
297
+ ### Hackathon Submission βœ… COMPLETE
298
+ - [x] Gradio UI deployed on HuggingFace Spaces
299
+ - [x] Example queries working and tested
300
+ - [x] Architecture documentation
301
+ - [x] README with setup instructions
302
+
303
+ ### Phase 6-8 (Enhanced)
304
+ **Specs ready for implementation:**
305
+ - [ ] Embeddings & Semantic Search (Phase 6)
306
+ - [ ] Hypothesis Agent (Phase 7)
307
+ - [ ] Report Agent (Phase 8)
 
 
 
 
 
 
308
 
309
  ### What's EXPLICITLY Out of Scope
310
  **NOT building (to stay focused):**
docs/implementation/06_phase_embeddings.md ADDED
@@ -0,0 +1,409 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 6 Implementation Spec: Embeddings & Semantic Search
2
+
3
+ **Goal**: Add vector search for semantic evidence retrieval.
4
+ **Philosophy**: "Find what you mean, not just what you type."
5
+ **Prerequisite**: Phase 5 complete (Magentic working)
6
+
7
+ ---
8
+
9
+ ## 1. Why Embeddings?
10
+
11
+ Current limitation: **Keyword-only search misses semantically related papers.**
12
+
13
+ Example problem:
14
+ - User searches: "metformin alzheimer"
15
+ - PubMed returns: Papers with exact keywords
16
+ - MISSED: Papers about "AMPK activation neuroprotection" (same mechanism, different words)
17
+
18
+ With embeddings:
19
+ - Embed the query AND all evidence
20
+ - Find semantically similar papers even without keyword match
21
+ - Deduplicate by meaning, not just URL
22
+
23
+ ---
24
+
25
+ ## 2. Architecture
26
+
27
+ ### Current (Phase 5)
28
+ ```
29
+ Query β†’ SearchAgent β†’ PubMed/Web (keyword) β†’ Evidence
30
+ ```
31
+
32
+ ### Phase 6
33
+ ```
34
+ Query β†’ Embed(Query) β†’ SearchAgent
35
+ β”œβ”€β”€ PubMed/Web (keyword) β†’ Evidence
36
+ └── VectorDB (semantic) β†’ Related Evidence
37
+ ↑
38
+ Evidence β†’ Embed β†’ Store
39
+ ```
40
+
41
+ ### Shared Context Enhancement
42
+ ```python
43
+ # Current
44
+ evidence_store = {"current": []}
45
+
46
+ # Phase 6
47
+ evidence_store = {
48
+ "current": [], # Raw evidence
49
+ "embeddings": {}, # URL -> embedding vector
50
+ "vector_index": None, # ChromaDB collection
51
+ }
52
+ ```
53
+
54
+ ---
55
+
56
+ ## 3. Technology Choice
57
+
58
+ ### ChromaDB (Recommended)
59
+ - **Free**, open-source, local-first
60
+ - No API keys, no cloud dependency
61
+ - Supports sentence-transformers out of the box
62
+ - Perfect for hackathon (no infra setup)
63
+
64
+ ### Embedding Model
65
+ - `sentence-transformers/all-MiniLM-L6-v2` (fast, good quality)
66
+ - Or `BAAI/bge-small-en-v1.5` (better quality, still fast)
67
+
68
+ ---
69
+
70
+ ## 4. Implementation
71
+
72
+ ### 4.1 Dependencies
73
+
74
+ Add to `pyproject.toml`:
75
+ ```toml
76
+ [project.optional-dependencies]
77
+ embeddings = [
78
+ "chromadb>=0.4.0",
79
+ "sentence-transformers>=2.2.0",
80
+ ]
81
+ ```
82
+
83
+ ### 4.2 Embedding Service (`src/services/embeddings.py`)
84
+
85
+ > **CRITICAL: Async Pattern Required**
86
+ >
87
+ > `sentence-transformers` is synchronous and CPU-bound. Running it directly in async code
88
+ > will **block the event loop**, freezing the UI and halting all concurrent operations.
89
+ >
90
+ > **Solution**: Use `asyncio.run_in_executor()` to offload to thread pool.
91
+ > This pattern already exists in `src/tools/websearch.py:28-34`.
92
+
93
+ ```python
94
+ """Embedding service for semantic search.
95
+
96
+ IMPORTANT: All public methods are async to avoid blocking the event loop.
97
+ The sentence-transformers model is CPU-bound, so we use run_in_executor().
98
+ """
99
+ import asyncio
100
+ from typing import List
101
+
102
+ import chromadb
103
+ from sentence_transformers import SentenceTransformer
104
+
105
+
106
+ class EmbeddingService:
107
+ """Handles text embedding and vector storage.
108
+
109
+ All embedding operations run in a thread pool to avoid blocking
110
+ the async event loop. See src/tools/websearch.py for the pattern.
111
+ """
112
+
113
+ def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
114
+ self._model = SentenceTransformer(model_name)
115
+ self._client = chromadb.Client() # In-memory for hackathon
116
+ self._collection = self._client.create_collection(
117
+ name="evidence",
118
+ metadata={"hnsw:space": "cosine"}
119
+ )
120
+
121
+ # ─────────────────────────────────────────────────────────────────
122
+ # Sync internal methods (run in thread pool)
123
+ # ─────────────────────────────────────────────────────────────────
124
+
125
+ def _sync_embed(self, text: str) -> List[float]:
126
+ """Synchronous embedding - DO NOT call directly from async code."""
127
+ return self._model.encode(text).tolist()
128
+
129
+ def _sync_batch_embed(self, texts: List[str]) -> List[List[float]]:
130
+ """Batch embedding for efficiency - DO NOT call directly from async code."""
131
+ return [e.tolist() for e in self._model.encode(texts)]
132
+
133
+ # ─────────────────────────────────────────────────────────────────
134
+ # Async public methods (safe for event loop)
135
+ # ─────────────────────────────────────────────────────────────────
136
+
137
+ async def embed(self, text: str) -> List[float]:
138
+ """Embed a single text (async-safe).
139
+
140
+ Uses run_in_executor to avoid blocking the event loop.
141
+ """
142
+ loop = asyncio.get_running_loop()
143
+ return await loop.run_in_executor(None, self._sync_embed, text)
144
+
145
+ async def embed_batch(self, texts: List[str]) -> List[List[float]]:
146
+ """Batch embed multiple texts (async-safe, more efficient)."""
147
+ loop = asyncio.get_running_loop()
148
+ return await loop.run_in_executor(None, self._sync_batch_embed, texts)
149
+
150
+ async def add_evidence(self, evidence_id: str, content: str, metadata: dict) -> None:
151
+ """Add evidence to vector store (async-safe)."""
152
+ embedding = await self.embed(content)
153
+ # ChromaDB operations are fast, but wrap for consistency
154
+ loop = asyncio.get_running_loop()
155
+ await loop.run_in_executor(
156
+ None,
157
+ lambda: self._collection.add(
158
+ ids=[evidence_id],
159
+ embeddings=[embedding],
160
+ metadatas=[metadata],
161
+ documents=[content]
162
+ )
163
+ )
164
+
165
+ async def search_similar(self, query: str, n_results: int = 5) -> List[dict]:
166
+ """Find semantically similar evidence (async-safe)."""
167
+ query_embedding = await self.embed(query)
168
+
169
+ loop = asyncio.get_running_loop()
170
+ results = await loop.run_in_executor(
171
+ None,
172
+ lambda: self._collection.query(
173
+ query_embeddings=[query_embedding],
174
+ n_results=n_results
175
+ )
176
+ )
177
+
178
+ # Handle empty results gracefully
179
+ if not results["ids"] or not results["ids"][0]:
180
+ return []
181
+
182
+ return [
183
+ {"id": id, "content": doc, "metadata": meta, "distance": dist}
184
+ for id, doc, meta, dist in zip(
185
+ results["ids"][0],
186
+ results["documents"][0],
187
+ results["metadatas"][0],
188
+ results["distances"][0]
189
+ )
190
+ ]
191
+
192
+ async def deduplicate(self, new_evidence: List, threshold: float = 0.9) -> List:
193
+ """Remove semantically duplicate evidence (async-safe)."""
194
+ unique = []
195
+ for evidence in new_evidence:
196
+ similar = await self.search_similar(evidence.content, n_results=1)
197
+ if not similar or similar[0]["distance"] > (1 - threshold):
198
+ unique.append(evidence)
199
+ await self.add_evidence(
200
+ evidence_id=evidence.citation.url,
201
+ content=evidence.content,
202
+ metadata={"source": evidence.citation.source}
203
+ )
204
+ return unique
205
+ ```
206
+
207
+ ### 4.3 Enhanced SearchAgent (`src/agents/search_agent.py`)
208
+
209
+ Update SearchAgent to use embeddings. **Note**: All embedding calls are `await`ed:
210
+
211
+ ```python
212
+ class SearchAgent(BaseAgent):
213
+ def __init__(
214
+ self,
215
+ search_handler: SearchHandlerProtocol,
216
+ evidence_store: dict,
217
+ embedding_service: EmbeddingService | None = None, # NEW
218
+ ):
219
+ # ... existing init ...
220
+ self._embeddings = embedding_service
221
+
222
+ async def run(self, messages, *, thread=None, **kwargs) -> AgentRunResponse:
223
+ # ... extract query ...
224
+
225
+ # Execute keyword search
226
+ result = await self._handler.execute(query, max_results_per_tool=10)
227
+
228
+ # Semantic deduplication (NEW) - ALL CALLS ARE AWAITED
229
+ if self._embeddings:
230
+ # Deduplicate by semantic similarity (async-safe)
231
+ unique_evidence = await self._embeddings.deduplicate(result.evidence)
232
+
233
+ # Also search for semantically related evidence (async-safe)
234
+ related = await self._embeddings.search_similar(query, n_results=5)
235
+
236
+ # Merge related evidence not already in results
237
+ existing_urls = {e.citation.url for e in unique_evidence}
238
+ for item in related:
239
+ if item["id"] not in existing_urls:
240
+ # Reconstruct Evidence from stored data
241
+ # ... merge logic ...
242
+
243
+ # ... rest of method ...
244
+ ```
245
+
246
+ ### 4.4 Semantic Expansion in Orchestrator
247
+
248
+ The MagenticOrchestrator can use embeddings to expand queries:
249
+
250
+ ```python
251
+ # In task instruction
252
+ task = f"""Research drug repurposing opportunities for: {query}
253
+
254
+ The system has semantic search enabled. When evidence is found:
255
+ 1. Related concepts will be automatically surfaced
256
+ 2. Duplicates are removed by meaning, not just URL
257
+ 3. Use the surfaced related concepts to refine searches
258
+ """
259
+ ```
260
+
261
+ ### 4.5 HuggingFace Spaces Deployment
262
+
263
+ > **⚠️ Important for HF Spaces**
264
+ >
265
+ > `sentence-transformers` downloads models (~500MB) to `~/.cache` on first use.
266
+ > HuggingFace Spaces have **ephemeral storage** - the cache is wiped on restart.
267
+ > This causes slow cold starts and bandwidth usage.
268
+
269
+ **Solution**: Pre-download the model in your Dockerfile:
270
+
271
+ ```dockerfile
272
+ # In Dockerfile
273
+ FROM python:3.11-slim
274
+
275
+ # Set cache directory
276
+ ENV HF_HOME=/app/.cache
277
+ ENV TRANSFORMERS_CACHE=/app/.cache
278
+
279
+ # Pre-download the embedding model during build
280
+ RUN pip install sentence-transformers && \
281
+ python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
282
+
283
+ # ... rest of Dockerfile
284
+ ```
285
+
286
+ **Alternative**: Use environment variable to specify persistent path:
287
+
288
+ ```yaml
289
+ # In HF Spaces settings or app.yaml
290
+ env:
291
+ - name: HF_HOME
292
+ value: /data/.cache # Persistent volume
293
+ ```
294
+
295
+ ---
296
+
297
+ ## 5. Directory Structure After Phase 6
298
+
299
+ ```
300
+ src/
301
+ β”œβ”€β”€ services/ # NEW
302
+ β”‚ β”œβ”€β”€ __init__.py
303
+ β”‚ └── embeddings.py # EmbeddingService
304
+ β”œβ”€β”€ agents/
305
+ β”‚ β”œβ”€β”€ search_agent.py # Updated with embeddings
306
+ β”‚ └── judge_agent.py
307
+ └── ...
308
+ ```
309
+
310
+ ---
311
+
312
+ ## 6. Tests
313
+
314
+ ### 6.1 Unit Tests (`tests/unit/services/test_embeddings.py`)
315
+
316
+ > **Note**: All tests are async since the EmbeddingService methods are async.
317
+
318
+ ```python
319
+ """Unit tests for EmbeddingService."""
320
+ import pytest
321
+ from src.services.embeddings import EmbeddingService
322
+
323
+
324
+ class TestEmbeddingService:
325
+ @pytest.mark.asyncio
326
+ async def test_embed_returns_vector(self):
327
+ """Embedding should return a float vector."""
328
+ service = EmbeddingService()
329
+ embedding = await service.embed("metformin diabetes")
330
+ assert isinstance(embedding, list)
331
+ assert len(embedding) > 0
332
+ assert all(isinstance(x, float) for x in embedding)
333
+
334
+ @pytest.mark.asyncio
335
+ async def test_similar_texts_have_close_embeddings(self):
336
+ """Semantically similar texts should have similar embeddings."""
337
+ service = EmbeddingService()
338
+ e1 = await service.embed("metformin treats diabetes")
339
+ e2 = await service.embed("metformin is used for diabetes treatment")
340
+ e3 = await service.embed("the weather is sunny today")
341
+
342
+ # Cosine similarity helper
343
+ from numpy import dot
344
+ from numpy.linalg import norm
345
+ cosine = lambda a, b: dot(a, b) / (norm(a) * norm(b))
346
+
347
+ # Similar texts should be closer
348
+ assert cosine(e1, e2) > cosine(e1, e3)
349
+
350
+ @pytest.mark.asyncio
351
+ async def test_batch_embed_efficient(self):
352
+ """Batch embedding should be more efficient than individual calls."""
353
+ service = EmbeddingService()
354
+ texts = ["text one", "text two", "text three"]
355
+
356
+ # Batch embed
357
+ batch_results = await service.embed_batch(texts)
358
+ assert len(batch_results) == 3
359
+ assert all(isinstance(e, list) for e in batch_results)
360
+
361
+ @pytest.mark.asyncio
362
+ async def test_add_and_search(self):
363
+ """Should be able to add evidence and search for similar."""
364
+ service = EmbeddingService()
365
+ await service.add_evidence(
366
+ evidence_id="test1",
367
+ content="Metformin activates AMPK pathway",
368
+ metadata={"source": "pubmed"}
369
+ )
370
+
371
+ results = await service.search_similar("AMPK activation drugs", n_results=1)
372
+ assert len(results) == 1
373
+ assert "AMPK" in results[0]["content"]
374
+
375
+ @pytest.mark.asyncio
376
+ async def test_search_similar_empty_collection(self):
377
+ """Search on empty collection should return empty list, not error."""
378
+ service = EmbeddingService()
379
+ results = await service.search_similar("anything", n_results=5)
380
+ assert results == []
381
+ ```
382
+
383
+ ---
384
+
385
+ ## 7. Definition of Done
386
+
387
+ Phase 6 is **COMPLETE** when:
388
+
389
+ 1. `EmbeddingService` implemented with ChromaDB
390
+ 2. SearchAgent uses embeddings for deduplication
391
+ 3. Semantic search surfaces related evidence
392
+ 4. All unit tests pass
393
+ 5. Integration test shows improved recall (finds related papers)
394
+
395
+ ---
396
+
397
+ ## 8. Value Delivered
398
+
399
+ | Before (Phase 5) | After (Phase 6) |
400
+ |------------------|-----------------|
401
+ | Keyword-only search | Semantic + keyword search |
402
+ | URL-based deduplication | Meaning-based deduplication |
403
+ | Miss related papers | Surface related concepts |
404
+ | Exact match required | Fuzzy semantic matching |
405
+
406
+ **Real example improvement:**
407
+ - Query: "metformin alzheimer"
408
+ - Before: Only papers mentioning both words
409
+ - After: Also finds "AMPK neuroprotection", "biguanide cognitive", etc.
docs/implementation/07_phase_hypothesis.md ADDED
@@ -0,0 +1,630 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 7 Implementation Spec: Hypothesis Agent
2
+
3
+ **Goal**: Add an agent that generates scientific hypotheses to guide targeted searches.
4
+ **Philosophy**: "Don't just find evidenceβ€”understand the mechanisms."
5
+ **Prerequisite**: Phase 6 complete (Embeddings working)
6
+
7
+ ---
8
+
9
+ ## 1. Why Hypothesis Agent?
10
+
11
+ Current limitation: **Search is reactive, not hypothesis-driven.**
12
+
13
+ Current flow:
14
+ 1. User asks about "metformin alzheimer"
15
+ 2. Search finds papers
16
+ 3. Judge says "need more evidence"
17
+ 4. Search again with slightly different keywords
18
+
19
+ With Hypothesis Agent:
20
+ 1. User asks about "metformin alzheimer"
21
+ 2. Search finds initial papers
22
+ 3. **Hypothesis Agent analyzes**: "Evidence suggests metformin β†’ AMPK activation β†’ autophagy β†’ amyloid clearance"
23
+ 4. Search can now target: "metformin AMPK", "autophagy neurodegeneration", "amyloid clearance drugs"
24
+
25
+ **Key insight**: Scientific research is hypothesis-driven. The agent should think like a researcher.
26
+
27
+ ---
28
+
29
+ ## 2. Architecture
30
+
31
+ ### Current (Phase 6)
32
+ ```
33
+ User Query β†’ Magentic Manager
34
+ β”œβ”€β”€ SearchAgent β†’ Evidence
35
+ └── JudgeAgent β†’ Sufficient? β†’ Synthesize/Continue
36
+ ```
37
+
38
+ ### Phase 7
39
+ ```
40
+ User Query β†’ Magentic Manager
41
+ β”œβ”€β”€ SearchAgent β†’ Evidence
42
+ β”œβ”€β”€ HypothesisAgent β†’ Mechanistic Hypotheses ← NEW
43
+ └── JudgeAgent β†’ Sufficient? β†’ Synthesize/Continue
44
+ ↑
45
+ Uses hypotheses to guide next search
46
+ ```
47
+
48
+ ### Shared Context Enhancement
49
+ ```python
50
+ evidence_store = {
51
+ "current": [],
52
+ "embeddings": {},
53
+ "vector_index": None,
54
+ "hypotheses": [], # NEW: Generated hypotheses
55
+ "tested_hypotheses": [], # NEW: Hypotheses with supporting/contradicting evidence
56
+ }
57
+ ```
58
+
59
+ ---
60
+
61
+ ## 3. Hypothesis Model
62
+
63
+ ### 3.1 Data Model (`src/utils/models.py`)
64
+
65
+ ```python
66
+ class MechanismHypothesis(BaseModel):
67
+ """A scientific hypothesis about drug mechanism."""
68
+
69
+ drug: str = Field(description="The drug being studied")
70
+ target: str = Field(description="Molecular target (e.g., AMPK, mTOR)")
71
+ pathway: str = Field(description="Biological pathway affected")
72
+ effect: str = Field(description="Downstream effect on disease")
73
+ confidence: float = Field(ge=0, le=1, description="Confidence in hypothesis")
74
+ supporting_evidence: list[str] = Field(
75
+ default_factory=list,
76
+ description="PMIDs or URLs supporting this hypothesis"
77
+ )
78
+ contradicting_evidence: list[str] = Field(
79
+ default_factory=list,
80
+ description="PMIDs or URLs contradicting this hypothesis"
81
+ )
82
+ search_suggestions: list[str] = Field(
83
+ default_factory=list,
84
+ description="Suggested searches to test this hypothesis"
85
+ )
86
+
87
+ def to_search_queries(self) -> list[str]:
88
+ """Generate search queries to test this hypothesis."""
89
+ return [
90
+ f"{self.drug} {self.target}",
91
+ f"{self.target} {self.pathway}",
92
+ f"{self.pathway} {self.effect}",
93
+ *self.search_suggestions
94
+ ]
95
+ ```
96
+
97
+ ### 3.2 Hypothesis Assessment
98
+
99
+ ```python
100
+ class HypothesisAssessment(BaseModel):
101
+ """Assessment of evidence against hypotheses."""
102
+
103
+ hypotheses: list[MechanismHypothesis]
104
+ primary_hypothesis: MechanismHypothesis | None = Field(
105
+ description="Most promising hypothesis based on current evidence"
106
+ )
107
+ knowledge_gaps: list[str] = Field(
108
+ description="What we don't know yet"
109
+ )
110
+ recommended_searches: list[str] = Field(
111
+ description="Searches to fill knowledge gaps"
112
+ )
113
+ ```
114
+
115
+ ---
116
+
117
+ ## 4. Implementation
118
+
119
+ ### 4.0 Text Utilities (`src/utils/text_utils.py`)
120
+
121
+ > **Why These Utilities?**
122
+ >
123
+ > The original spec used arbitrary truncation (`evidence[:10]` and `content[:300]`).
124
+ > This loses important information randomly. These utilities provide:
125
+ > 1. **Sentence-aware truncation** - cuts at sentence boundaries, not mid-word
126
+ > 2. **Diverse evidence selection** - uses embeddings to select varied evidence (MMR)
127
+
128
+ ```python
129
+ """Text processing utilities for evidence handling."""
130
+ from typing import TYPE_CHECKING
131
+
132
+ if TYPE_CHECKING:
133
+ from src.services.embeddings import EmbeddingService
134
+ from src.utils.models import Evidence
135
+
136
+
137
+ def truncate_at_sentence(text: str, max_chars: int = 300) -> str:
138
+ """Truncate text at sentence boundary, preserving meaning.
139
+
140
+ Args:
141
+ text: The text to truncate
142
+ max_chars: Maximum characters (default 300)
143
+
144
+ Returns:
145
+ Text truncated at last complete sentence within limit
146
+ """
147
+ if len(text) <= max_chars:
148
+ return text
149
+
150
+ # Find truncation point
151
+ truncated = text[:max_chars]
152
+
153
+ # Look for sentence endings: . ! ? followed by space or end
154
+ for sep in ['. ', '! ', '? ', '.\n', '!\n', '?\n']:
155
+ last_sep = truncated.rfind(sep)
156
+ if last_sep > max_chars // 2: # Don't truncate too aggressively
157
+ return text[:last_sep + 1].strip()
158
+
159
+ # Fallback: find last period
160
+ last_period = truncated.rfind('.')
161
+ if last_period > max_chars // 2:
162
+ return text[:last_period + 1].strip()
163
+
164
+ # Last resort: truncate at word boundary
165
+ last_space = truncated.rfind(' ')
166
+ if last_space > 0:
167
+ return text[:last_space].strip() + "..."
168
+
169
+ return truncated + "..."
170
+
171
+
172
+ async def select_diverse_evidence(
173
+ evidence: list["Evidence"],
174
+ n: int,
175
+ query: str,
176
+ embeddings: "EmbeddingService | None" = None
177
+ ) -> list["Evidence"]:
178
+ """Select n most diverse and relevant evidence items.
179
+
180
+ Uses Maximal Marginal Relevance (MMR) when embeddings available,
181
+ falls back to relevance_score sorting otherwise.
182
+
183
+ Args:
184
+ evidence: All available evidence
185
+ n: Number of items to select
186
+ query: Original query for relevance scoring
187
+ embeddings: Optional EmbeddingService for semantic diversity
188
+
189
+ Returns:
190
+ Selected evidence items, diverse and relevant
191
+ """
192
+ if not evidence:
193
+ return []
194
+
195
+ if n >= len(evidence):
196
+ return evidence
197
+
198
+ # Fallback: sort by relevance score if no embeddings
199
+ if embeddings is None:
200
+ return sorted(
201
+ evidence,
202
+ key=lambda e: e.relevance_score,
203
+ reverse=True
204
+ )[:n]
205
+
206
+ # MMR: Maximal Marginal Relevance for diverse selection
207
+ # Score = Ξ» * relevance - (1-Ξ») * max_similarity_to_selected
208
+ lambda_param = 0.7 # Balance relevance vs diversity
209
+
210
+ # Get query embedding
211
+ query_emb = await embeddings.embed(query)
212
+
213
+ # Get all evidence embeddings
214
+ evidence_embs = await embeddings.embed_batch([e.content for e in evidence])
215
+
216
+ # Compute relevance scores (cosine similarity to query)
217
+ from numpy import dot
218
+ from numpy.linalg import norm
219
+ cosine = lambda a, b: float(dot(a, b) / (norm(a) * norm(b)))
220
+
221
+ relevance_scores = [cosine(query_emb, emb) for emb in evidence_embs]
222
+
223
+ # Greedy MMR selection
224
+ selected_indices: list[int] = []
225
+ remaining = set(range(len(evidence)))
226
+
227
+ for _ in range(n):
228
+ best_score = float('-inf')
229
+ best_idx = -1
230
+
231
+ for idx in remaining:
232
+ # Relevance component
233
+ relevance = relevance_scores[idx]
234
+
235
+ # Diversity component: max similarity to already selected
236
+ if selected_indices:
237
+ max_sim = max(
238
+ cosine(evidence_embs[idx], evidence_embs[sel])
239
+ for sel in selected_indices
240
+ )
241
+ else:
242
+ max_sim = 0
243
+
244
+ # MMR score
245
+ mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
246
+
247
+ if mmr_score > best_score:
248
+ best_score = mmr_score
249
+ best_idx = idx
250
+
251
+ if best_idx >= 0:
252
+ selected_indices.append(best_idx)
253
+ remaining.remove(best_idx)
254
+
255
+ return [evidence[i] for i in selected_indices]
256
+ ```
257
+
258
+ ### 4.1 Hypothesis Prompts (`src/prompts/hypothesis.py`)
259
+
260
+ ```python
261
+ """Prompts for Hypothesis Agent."""
262
+ from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence
263
+
264
+ SYSTEM_PROMPT = """You are a biomedical research scientist specializing in drug repurposing.
265
+
266
+ Your role is to generate mechanistic hypotheses based on evidence.
267
+
268
+ A good hypothesis:
269
+ 1. Proposes a MECHANISM: Drug β†’ Target β†’ Pathway β†’ Effect
270
+ 2. Is TESTABLE: Can be supported or refuted by literature search
271
+ 3. Is SPECIFIC: Names actual molecular targets and pathways
272
+ 4. Generates SEARCH QUERIES: Helps find more evidence
273
+
274
+ Example hypothesis format:
275
+ - Drug: Metformin
276
+ - Target: AMPK (AMP-activated protein kinase)
277
+ - Pathway: mTOR inhibition β†’ autophagy activation
278
+ - Effect: Enhanced clearance of amyloid-beta in Alzheimer's
279
+ - Confidence: 0.7
280
+ - Search suggestions: ["metformin AMPK brain", "autophagy amyloid clearance"]
281
+
282
+ Be specific. Use actual gene/protein names when possible."""
283
+
284
+
285
+ async def format_hypothesis_prompt(
286
+ query: str,
287
+ evidence: list,
288
+ embeddings=None
289
+ ) -> str:
290
+ """Format prompt for hypothesis generation.
291
+
292
+ Uses smart evidence selection instead of arbitrary truncation.
293
+
294
+ Args:
295
+ query: The research query
296
+ evidence: All collected evidence
297
+ embeddings: Optional EmbeddingService for diverse selection
298
+ """
299
+ # Select diverse, relevant evidence (not arbitrary first 10)
300
+ selected = await select_diverse_evidence(
301
+ evidence, n=10, query=query, embeddings=embeddings
302
+ )
303
+
304
+ # Format with sentence-aware truncation
305
+ evidence_text = "\n".join([
306
+ f"- **{e.citation.title}** ({e.citation.source}): {truncate_at_sentence(e.content, 300)}"
307
+ for e in selected
308
+ ])
309
+
310
+ return f"""Based on the following evidence about "{query}", generate mechanistic hypotheses.
311
+
312
+ ## Evidence ({len(selected)} papers selected for diversity)
313
+ {evidence_text}
314
+
315
+ ## Task
316
+ 1. Identify potential drug targets mentioned in the evidence
317
+ 2. Propose mechanism hypotheses (Drug β†’ Target β†’ Pathway β†’ Effect)
318
+ 3. Rate confidence based on evidence strength
319
+ 4. Suggest searches to test each hypothesis
320
+
321
+ Generate 2-4 hypotheses, prioritized by confidence."""
322
+ ```
323
+
324
+ ### 4.2 Hypothesis Agent (`src/agents/hypothesis_agent.py`)
325
+
326
+ ```python
327
+ """Hypothesis agent for mechanistic reasoning."""
328
+ from collections.abc import AsyncIterable
329
+ from typing import TYPE_CHECKING, Any
330
+
331
+ from agent_framework import (
332
+ AgentRunResponse,
333
+ AgentRunResponseUpdate,
334
+ AgentThread,
335
+ BaseAgent,
336
+ ChatMessage,
337
+ Role,
338
+ )
339
+ from pydantic_ai import Agent
340
+
341
+ from src.prompts.hypothesis import SYSTEM_PROMPT, format_hypothesis_prompt
342
+ from src.utils.config import settings
343
+ from src.utils.models import Evidence, HypothesisAssessment
344
+
345
+ if TYPE_CHECKING:
346
+ from src.services.embeddings import EmbeddingService
347
+
348
+
349
+ class HypothesisAgent(BaseAgent):
350
+ """Generates mechanistic hypotheses based on evidence."""
351
+
352
+ def __init__(
353
+ self,
354
+ evidence_store: dict[str, list[Evidence]],
355
+ embedding_service: "EmbeddingService | None" = None, # NEW: for diverse selection
356
+ ) -> None:
357
+ super().__init__(
358
+ name="HypothesisAgent",
359
+ description="Generates scientific hypotheses about drug mechanisms to guide research",
360
+ )
361
+ self._evidence_store = evidence_store
362
+ self._embeddings = embedding_service # Used for MMR evidence selection
363
+ self._agent = Agent(
364
+ model=settings.llm_provider, # Uses configured LLM
365
+ output_type=HypothesisAssessment,
366
+ system_prompt=SYSTEM_PROMPT,
367
+ )
368
+
369
+ async def run(
370
+ self,
371
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
372
+ *,
373
+ thread: AgentThread | None = None,
374
+ **kwargs: Any,
375
+ ) -> AgentRunResponse:
376
+ """Generate hypotheses based on current evidence."""
377
+ # Extract query
378
+ query = self._extract_query(messages)
379
+
380
+ # Get current evidence
381
+ evidence = self._evidence_store.get("current", [])
382
+
383
+ if not evidence:
384
+ return AgentRunResponse(
385
+ messages=[ChatMessage(
386
+ role=Role.ASSISTANT,
387
+ text="No evidence available yet. Search for evidence first."
388
+ )],
389
+ response_id="hypothesis-no-evidence",
390
+ )
391
+
392
+ # Generate hypotheses with diverse evidence selection
393
+ # NOTE: format_hypothesis_prompt is now async
394
+ prompt = await format_hypothesis_prompt(
395
+ query, evidence, embeddings=self._embeddings
396
+ )
397
+ result = await self._agent.run(prompt)
398
+ assessment = result.output
399
+
400
+ # Store hypotheses in shared context
401
+ existing = self._evidence_store.get("hypotheses", [])
402
+ self._evidence_store["hypotheses"] = existing + assessment.hypotheses
403
+
404
+ # Format response
405
+ response_text = self._format_response(assessment)
406
+
407
+ return AgentRunResponse(
408
+ messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
409
+ response_id=f"hypothesis-{len(assessment.hypotheses)}",
410
+ additional_properties={"assessment": assessment.model_dump()},
411
+ )
412
+
413
+ def _format_response(self, assessment: HypothesisAssessment) -> str:
414
+ """Format hypothesis assessment as markdown."""
415
+ lines = ["## Generated Hypotheses\n"]
416
+
417
+ for i, h in enumerate(assessment.hypotheses, 1):
418
+ lines.append(f"### Hypothesis {i} (Confidence: {h.confidence:.0%})")
419
+ lines.append(f"**Mechanism**: {h.drug} β†’ {h.target} β†’ {h.pathway} β†’ {h.effect}")
420
+ lines.append(f"**Suggested searches**: {', '.join(h.search_suggestions)}\n")
421
+
422
+ if assessment.primary_hypothesis:
423
+ lines.append(f"### Primary Hypothesis")
424
+ h = assessment.primary_hypothesis
425
+ lines.append(f"{h.drug} β†’ {h.target} β†’ {h.pathway} β†’ {h.effect}\n")
426
+
427
+ if assessment.knowledge_gaps:
428
+ lines.append("### Knowledge Gaps")
429
+ for gap in assessment.knowledge_gaps:
430
+ lines.append(f"- {gap}")
431
+
432
+ if assessment.recommended_searches:
433
+ lines.append("\n### Recommended Next Searches")
434
+ for search in assessment.recommended_searches:
435
+ lines.append(f"- `{search}`")
436
+
437
+ return "\n".join(lines)
438
+
439
+ def _extract_query(self, messages) -> str:
440
+ """Extract query from messages."""
441
+ if isinstance(messages, str):
442
+ return messages
443
+ elif isinstance(messages, ChatMessage):
444
+ return messages.text or ""
445
+ elif isinstance(messages, list):
446
+ for msg in reversed(messages):
447
+ if isinstance(msg, ChatMessage) and msg.role == Role.USER:
448
+ return msg.text or ""
449
+ elif isinstance(msg, str):
450
+ return msg
451
+ return ""
452
+
453
+ async def run_stream(
454
+ self,
455
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
456
+ *,
457
+ thread: AgentThread | None = None,
458
+ **kwargs: Any,
459
+ ) -> AsyncIterable[AgentRunResponseUpdate]:
460
+ """Streaming wrapper."""
461
+ result = await self.run(messages, thread=thread, **kwargs)
462
+ yield AgentRunResponseUpdate(
463
+ messages=result.messages,
464
+ response_id=result.response_id
465
+ )
466
+ ```
467
+
468
+ ### 4.3 Update MagenticOrchestrator
469
+
470
+ Add HypothesisAgent to the workflow:
471
+
472
+ ```python
473
+ # In MagenticOrchestrator.__init__
474
+ self._hypothesis_agent = HypothesisAgent(self._evidence_store)
475
+
476
+ # In workflow building
477
+ workflow = (
478
+ MagenticBuilder()
479
+ .participants(
480
+ searcher=search_agent,
481
+ hypothesizer=self._hypothesis_agent, # NEW
482
+ judge=judge_agent,
483
+ )
484
+ .with_standard_manager(...)
485
+ .build()
486
+ )
487
+
488
+ # Update task instruction
489
+ task = f"""Research drug repurposing opportunities for: {query}
490
+
491
+ Workflow:
492
+ 1. SearchAgent: Find initial evidence from PubMed and web
493
+ 2. HypothesisAgent: Generate mechanistic hypotheses (Drug β†’ Target β†’ Pathway β†’ Effect)
494
+ 3. SearchAgent: Use hypothesis-suggested queries for targeted search
495
+ 4. JudgeAgent: Evaluate if evidence supports hypotheses
496
+ 5. Repeat until confident or max rounds
497
+
498
+ Focus on:
499
+ - Identifying specific molecular targets
500
+ - Understanding mechanism of action
501
+ - Finding supporting/contradicting evidence for hypotheses
502
+ """
503
+ ```
504
+
505
+ ---
506
+
507
+ ## 5. Directory Structure After Phase 7
508
+
509
+ ```
510
+ src/
511
+ β”œβ”€β”€ agents/
512
+ β”‚ β”œβ”€β”€ search_agent.py
513
+ β”‚ β”œβ”€β”€ judge_agent.py
514
+ β”‚ └── hypothesis_agent.py # NEW
515
+ β”œβ”€β”€ prompts/
516
+ β”‚ β”œβ”€β”€ judge.py
517
+ β”‚ └── hypothesis.py # NEW
518
+ β”œβ”€β”€ services/
519
+ β”‚ └── embeddings.py
520
+ └── utils/
521
+ └── models.py # Updated with hypothesis models
522
+ ```
523
+
524
+ ---
525
+
526
+ ## 6. Tests
527
+
528
+ ### 6.1 Unit Tests (`tests/unit/agents/test_hypothesis_agent.py`)
529
+
530
+ ```python
531
+ """Unit tests for HypothesisAgent."""
532
+ import pytest
533
+ from unittest.mock import AsyncMock, MagicMock, patch
534
+
535
+ from src.agents.hypothesis_agent import HypothesisAgent
536
+ from src.utils.models import Citation, Evidence, HypothesisAssessment, MechanismHypothesis
537
+
538
+
539
+ @pytest.fixture
540
+ def sample_evidence():
541
+ return [
542
+ Evidence(
543
+ content="Metformin activates AMPK, which inhibits mTOR signaling...",
544
+ citation=Citation(
545
+ source="pubmed",
546
+ title="Metformin and AMPK",
547
+ url="https://pubmed.ncbi.nlm.nih.gov/12345/",
548
+ date="2023"
549
+ )
550
+ )
551
+ ]
552
+
553
+
554
+ @pytest.fixture
555
+ def mock_assessment():
556
+ return HypothesisAssessment(
557
+ hypotheses=[
558
+ MechanismHypothesis(
559
+ drug="Metformin",
560
+ target="AMPK",
561
+ pathway="mTOR inhibition",
562
+ effect="Reduced cancer cell proliferation",
563
+ confidence=0.75,
564
+ search_suggestions=["metformin AMPK cancer", "mTOR cancer therapy"]
565
+ )
566
+ ],
567
+ primary_hypothesis=None,
568
+ knowledge_gaps=["Clinical trial data needed"],
569
+ recommended_searches=["metformin clinical trial cancer"]
570
+ )
571
+
572
+
573
+ @pytest.mark.asyncio
574
+ async def test_hypothesis_agent_generates_hypotheses(sample_evidence, mock_assessment):
575
+ """HypothesisAgent should generate mechanistic hypotheses."""
576
+ store = {"current": sample_evidence, "hypotheses": []}
577
+
578
+ with patch("src.agents.hypothesis_agent.Agent") as MockAgent:
579
+ mock_result = MagicMock()
580
+ mock_result.output = mock_assessment
581
+ MockAgent.return_value.run = AsyncMock(return_value=mock_result)
582
+
583
+ agent = HypothesisAgent(store)
584
+ response = await agent.run("metformin cancer")
585
+
586
+ assert "AMPK" in response.messages[0].text
587
+ assert len(store["hypotheses"]) == 1
588
+
589
+
590
+ @pytest.mark.asyncio
591
+ async def test_hypothesis_agent_no_evidence():
592
+ """HypothesisAgent should handle empty evidence gracefully."""
593
+ store = {"current": [], "hypotheses": []}
594
+ agent = HypothesisAgent(store)
595
+
596
+ response = await agent.run("test query")
597
+
598
+ assert "No evidence" in response.messages[0].text
599
+ ```
600
+
601
+ ---
602
+
603
+ ## 7. Definition of Done
604
+
605
+ Phase 7 is **COMPLETE** when:
606
+
607
+ 1. `MechanismHypothesis` and `HypothesisAssessment` models implemented
608
+ 2. `HypothesisAgent` generates hypotheses from evidence
609
+ 3. Hypotheses stored in shared context
610
+ 4. Search queries generated from hypotheses
611
+ 5. Magentic workflow includes HypothesisAgent
612
+ 6. All unit tests pass
613
+
614
+ ---
615
+
616
+ ## 8. Value Delivered
617
+
618
+ | Before (Phase 6) | After (Phase 7) |
619
+ |------------------|-----------------|
620
+ | Reactive search | Hypothesis-driven search |
621
+ | Generic queries | Mechanism-targeted queries |
622
+ | No scientific reasoning | Drug β†’ Target β†’ Pathway β†’ Effect |
623
+ | Judge says "need more" | Hypothesis says "search for X to test Y" |
624
+
625
+ **Real example improvement:**
626
+ - Query: "metformin alzheimer"
627
+ - Before: "metformin alzheimer mechanism", "metformin brain"
628
+ - After: "metformin AMPK activation", "AMPK autophagy neurodegeneration", "autophagy amyloid clearance"
629
+
630
+ The search becomes **scientifically targeted** rather than keyword variations.
docs/implementation/08_phase_report.md ADDED
@@ -0,0 +1,854 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 8 Implementation Spec: Report Agent
2
+
3
+ **Goal**: Generate structured scientific reports with proper citations and methodology.
4
+ **Philosophy**: "Research isn't complete until it's communicated clearly."
5
+ **Prerequisite**: Phase 7 complete (Hypothesis Agent working)
6
+
7
+ ---
8
+
9
+ ## 1. Why Report Agent?
10
+
11
+ Current limitation: **Synthesis is basic markdown, not a scientific report.**
12
+
13
+ Current output:
14
+ ```
15
+ ## Drug Repurposing Analysis
16
+ ### Drug Candidates
17
+ - Metformin
18
+ ### Key Findings
19
+ - Some findings
20
+ ### Citations
21
+ 1. [Paper 1](url)
22
+ ```
23
+
24
+ With Report Agent:
25
+ ```
26
+ ## Executive Summary
27
+ One-paragraph summary for busy readers...
28
+
29
+ ## Research Question
30
+ Clear statement of what was investigated...
31
+
32
+ ## Methodology
33
+ - Sources searched: PubMed, DuckDuckGo
34
+ - Date range: ...
35
+ - Inclusion criteria: ...
36
+
37
+ ## Hypotheses Tested
38
+ 1. Metformin β†’ AMPK β†’ neuroprotection (Supported: 7 papers, Contradicted: 2)
39
+
40
+ ## Findings
41
+ ### Mechanistic Evidence
42
+ ...
43
+ ### Clinical Evidence
44
+ ...
45
+
46
+ ## Limitations
47
+ - Only English language papers
48
+ - Abstract-level analysis only
49
+
50
+ ## Conclusion
51
+ ...
52
+
53
+ ## References
54
+ Properly formatted citations...
55
+ ```
56
+
57
+ ---
58
+
59
+ ## 2. Architecture
60
+
61
+ ### Phase 8 Addition
62
+ ```
63
+ Evidence + Hypotheses + Assessment
64
+ ↓
65
+ Report Agent
66
+ ↓
67
+ Structured Scientific Report
68
+ ```
69
+
70
+ ### Report Generation Flow
71
+ ```
72
+ 1. JudgeAgent says "synthesize"
73
+ 2. Magentic Manager selects ReportAgent
74
+ 3. ReportAgent gathers:
75
+ - All evidence from shared context
76
+ - All hypotheses (supported/contradicted)
77
+ - Assessment scores
78
+ 4. ReportAgent generates structured report
79
+ 5. Final output to user
80
+ ```
81
+
82
+ ---
83
+
84
+ ## 3. Report Model
85
+
86
+ ### 3.1 Data Model (`src/utils/models.py`)
87
+
88
+ ```python
89
+ class ReportSection(BaseModel):
90
+ """A section of the research report."""
91
+ title: str
92
+ content: str
93
+ citations: list[str] = Field(default_factory=list)
94
+
95
+
96
+ class ResearchReport(BaseModel):
97
+ """Structured scientific report."""
98
+
99
+ title: str = Field(description="Report title")
100
+ executive_summary: str = Field(
101
+ description="One-paragraph summary for quick reading",
102
+ min_length=100,
103
+ max_length=500
104
+ )
105
+ research_question: str = Field(description="Clear statement of what was investigated")
106
+
107
+ methodology: ReportSection = Field(description="How the research was conducted")
108
+ hypotheses_tested: list[dict] = Field(
109
+ description="Hypotheses with supporting/contradicting evidence counts"
110
+ )
111
+
112
+ mechanistic_findings: ReportSection = Field(
113
+ description="Findings about drug mechanisms"
114
+ )
115
+ clinical_findings: ReportSection = Field(
116
+ description="Findings from clinical/preclinical studies"
117
+ )
118
+
119
+ drug_candidates: list[str] = Field(description="Identified drug candidates")
120
+ limitations: list[str] = Field(description="Study limitations")
121
+ conclusion: str = Field(description="Overall conclusion")
122
+
123
+ references: list[dict] = Field(
124
+ description="Formatted references with title, authors, source, URL"
125
+ )
126
+
127
+ # Metadata
128
+ sources_searched: list[str] = Field(default_factory=list)
129
+ total_papers_reviewed: int = 0
130
+ search_iterations: int = 0
131
+ confidence_score: float = Field(ge=0, le=1)
132
+
133
+ def to_markdown(self) -> str:
134
+ """Render report as markdown."""
135
+ sections = [
136
+ f"# {self.title}\n",
137
+ f"## Executive Summary\n{self.executive_summary}\n",
138
+ f"## Research Question\n{self.research_question}\n",
139
+ f"## Methodology\n{self.methodology.content}\n",
140
+ ]
141
+
142
+ # Hypotheses
143
+ sections.append("## Hypotheses Tested\n")
144
+ for h in self.hypotheses_tested:
145
+ status = "βœ… Supported" if h.get("supported", 0) > h.get("contradicted", 0) else "⚠️ Mixed"
146
+ sections.append(
147
+ f"- **{h['mechanism']}** ({status}): "
148
+ f"{h.get('supported', 0)} supporting, {h.get('contradicted', 0)} contradicting\n"
149
+ )
150
+
151
+ # Findings
152
+ sections.append(f"## Mechanistic Findings\n{self.mechanistic_findings.content}\n")
153
+ sections.append(f"## Clinical Findings\n{self.clinical_findings.content}\n")
154
+
155
+ # Drug candidates
156
+ sections.append("## Drug Candidates\n")
157
+ for drug in self.drug_candidates:
158
+ sections.append(f"- **{drug}**\n")
159
+
160
+ # Limitations
161
+ sections.append("## Limitations\n")
162
+ for lim in self.limitations:
163
+ sections.append(f"- {lim}\n")
164
+
165
+ # Conclusion
166
+ sections.append(f"## Conclusion\n{self.conclusion}\n")
167
+
168
+ # References
169
+ sections.append("## References\n")
170
+ for i, ref in enumerate(self.references, 1):
171
+ sections.append(
172
+ f"{i}. {ref.get('authors', 'Unknown')}. "
173
+ f"*{ref.get('title', 'Untitled')}*. "
174
+ f"{ref.get('source', '')} ({ref.get('date', '')}). "
175
+ f"[Link]({ref.get('url', '#')})\n"
176
+ )
177
+
178
+ # Metadata footer
179
+ sections.append("\n---\n")
180
+ sections.append(
181
+ f"*Report generated from {self.total_papers_reviewed} papers "
182
+ f"across {self.search_iterations} search iterations. "
183
+ f"Confidence: {self.confidence_score:.0%}*"
184
+ )
185
+
186
+ return "\n".join(sections)
187
+ ```
188
+
189
+ ---
190
+
191
+ ## 4. Implementation
192
+
193
+ ### 4.0 Citation Validation (`src/utils/citation_validator.py`)
194
+
195
+ > **🚨 CRITICAL: Why Citation Validation?**
196
+ >
197
+ > LLMs frequently **hallucinate** citations - inventing paper titles, authors, and URLs
198
+ > that don't exist. For a medical research tool, fake citations are **dangerous**.
199
+ >
200
+ > This validation layer ensures every reference in the report actually exists
201
+ > in the collected evidence.
202
+
203
+ ```python
204
+ """Citation validation to prevent LLM hallucination.
205
+
206
+ CRITICAL: Medical research requires accurate citations.
207
+ This module validates that all references exist in collected evidence.
208
+ """
209
+ import logging
210
+ from typing import TYPE_CHECKING
211
+
212
+ if TYPE_CHECKING:
213
+ from src.utils.models import Evidence, ResearchReport
214
+
215
+ logger = logging.getLogger(__name__)
216
+
217
+
218
+ def validate_references(
219
+ report: "ResearchReport",
220
+ evidence: list["Evidence"]
221
+ ) -> "ResearchReport":
222
+ """Ensure all references actually exist in collected evidence.
223
+
224
+ CRITICAL: Prevents LLM hallucination of citations.
225
+
226
+ Args:
227
+ report: The generated research report
228
+ evidence: All evidence collected during research
229
+
230
+ Returns:
231
+ Report with only valid references (hallucinated ones removed)
232
+ """
233
+ # Build set of valid URLs from evidence
234
+ valid_urls = {e.citation.url for e in evidence}
235
+ valid_titles = {e.citation.title.lower() for e in evidence}
236
+
237
+ validated_refs = []
238
+ removed_count = 0
239
+
240
+ for ref in report.references:
241
+ ref_url = ref.get("url", "")
242
+ ref_title = ref.get("title", "").lower()
243
+
244
+ # Check if URL matches collected evidence
245
+ if ref_url in valid_urls:
246
+ validated_refs.append(ref)
247
+ # Fallback: check title match (URLs might differ slightly)
248
+ elif ref_title and any(ref_title in t or t in ref_title for t in valid_titles):
249
+ validated_refs.append(ref)
250
+ else:
251
+ removed_count += 1
252
+ logger.warning(
253
+ f"Removed hallucinated reference: '{ref.get('title', 'Unknown')}' "
254
+ f"(URL: {ref_url[:50]}...)"
255
+ )
256
+
257
+ if removed_count > 0:
258
+ logger.info(
259
+ f"Citation validation removed {removed_count} hallucinated references. "
260
+ f"{len(validated_refs)} valid references remain."
261
+ )
262
+
263
+ # Update report with validated references
264
+ report.references = validated_refs
265
+ return report
266
+
267
+
268
+ def build_reference_from_evidence(evidence: "Evidence") -> dict:
269
+ """Build a properly formatted reference from evidence.
270
+
271
+ Use this to ensure references match the original evidence exactly.
272
+ """
273
+ return {
274
+ "title": evidence.citation.title,
275
+ "authors": evidence.citation.authors or ["Unknown"],
276
+ "source": evidence.citation.source,
277
+ "date": evidence.citation.date or "n.d.",
278
+ "url": evidence.citation.url,
279
+ }
280
+ ```
281
+
282
+ ### 4.1 Report Prompts (`src/prompts/report.py`)
283
+
284
+ ```python
285
+ """Prompts for Report Agent."""
286
+ from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence
287
+
288
+ SYSTEM_PROMPT = """You are a scientific writer specializing in drug repurposing research reports.
289
+
290
+ Your role is to synthesize evidence and hypotheses into a clear, structured report.
291
+
292
+ A good report:
293
+ 1. Has a clear EXECUTIVE SUMMARY (one paragraph, key takeaways)
294
+ 2. States the RESEARCH QUESTION clearly
295
+ 3. Describes METHODOLOGY (what was searched, how)
296
+ 4. Evaluates HYPOTHESES with evidence counts
297
+ 5. Separates MECHANISTIC and CLINICAL findings
298
+ 6. Lists specific DRUG CANDIDATES
299
+ 7. Acknowledges LIMITATIONS honestly
300
+ 8. Provides a balanced CONCLUSION
301
+ 9. Includes properly formatted REFERENCES
302
+
303
+ Write in scientific but accessible language. Be specific about evidence strength.
304
+
305
+ ─────────────────────────────────────────────────────────────────────────────
306
+ 🚨 CRITICAL CITATION REQUIREMENTS 🚨
307
+ ─────────────────────────────────────────────────────────────────────────────
308
+
309
+ You MUST follow these rules for the References section:
310
+
311
+ 1. You may ONLY cite papers that appear in the Evidence section above
312
+ 2. Every reference URL must EXACTLY match a provided evidence URL
313
+ 3. Do NOT invent, fabricate, or hallucinate any references
314
+ 4. Do NOT modify paper titles, authors, dates, or URLs
315
+ 5. If unsure about a citation, OMIT it rather than guess
316
+ 6. Copy URLs exactly as provided - do not create similar-looking URLs
317
+
318
+ VIOLATION OF THESE RULES PRODUCES DANGEROUS MISINFORMATION.
319
+ ─────────────────────────────────────────────────────────────────────────────"""
320
+
321
+
322
+ async def format_report_prompt(
323
+ query: str,
324
+ evidence: list,
325
+ hypotheses: list,
326
+ assessment: dict,
327
+ metadata: dict,
328
+ embeddings=None
329
+ ) -> str:
330
+ """Format prompt for report generation.
331
+
332
+ Includes full evidence details for accurate citation.
333
+ """
334
+ # Select diverse evidence (not arbitrary truncation)
335
+ selected = await select_diverse_evidence(
336
+ evidence, n=20, query=query, embeddings=embeddings
337
+ )
338
+
339
+ # Include FULL citation details for each evidence item
340
+ # This helps the LLM create accurate references
341
+ evidence_summary = "\n".join([
342
+ f"- **Title**: {e.citation.title}\n"
343
+ f" **URL**: {e.citation.url}\n"
344
+ f" **Authors**: {', '.join(e.citation.authors or ['Unknown'])}\n"
345
+ f" **Date**: {e.citation.date or 'n.d.'}\n"
346
+ f" **Source**: {e.citation.source}\n"
347
+ f" **Content**: {truncate_at_sentence(e.content, 200)}\n"
348
+ for e in selected
349
+ ])
350
+
351
+ hypotheses_summary = "\n".join([
352
+ f"- {h.drug} β†’ {h.target} β†’ {h.pathway} β†’ {h.effect} (Confidence: {h.confidence:.0%})"
353
+ for h in hypotheses
354
+ ]) if hypotheses else "No hypotheses generated yet."
355
+
356
+ return f"""Generate a structured research report for the following query.
357
+
358
+ ## Original Query
359
+ {query}
360
+
361
+ ## Evidence Collected ({len(selected)} papers, selected for diversity)
362
+
363
+ {evidence_summary}
364
+
365
+ ## Hypotheses Generated
366
+ {hypotheses_summary}
367
+
368
+ ## Assessment Scores
369
+ - Mechanism Score: {assessment.get('mechanism_score', 'N/A')}/10
370
+ - Clinical Evidence Score: {assessment.get('clinical_score', 'N/A')}/10
371
+ - Overall Confidence: {assessment.get('confidence', 0):.0%}
372
+
373
+ ## Metadata
374
+ - Sources Searched: {', '.join(metadata.get('sources', []))}
375
+ - Search Iterations: {metadata.get('iterations', 0)}
376
+
377
+ Generate a complete ResearchReport with all sections filled in.
378
+
379
+ REMINDER: Only cite papers from the Evidence section above. Copy URLs exactly."""
380
+ ```
381
+
382
+ ### 4.2 Report Agent (`src/agents/report_agent.py`)
383
+
384
+ ```python
385
+ """Report agent for generating structured research reports."""
386
+ from collections.abc import AsyncIterable
387
+ from typing import TYPE_CHECKING, Any
388
+
389
+ from agent_framework import (
390
+ AgentRunResponse,
391
+ AgentRunResponseUpdate,
392
+ AgentThread,
393
+ BaseAgent,
394
+ ChatMessage,
395
+ Role,
396
+ )
397
+ from pydantic_ai import Agent
398
+
399
+ from src.prompts.report import SYSTEM_PROMPT, format_report_prompt
400
+ from src.utils.citation_validator import validate_references # CRITICAL
401
+ from src.utils.config import settings
402
+ from src.utils.models import Evidence, MechanismHypothesis, ResearchReport
403
+
404
+ if TYPE_CHECKING:
405
+ from src.services.embeddings import EmbeddingService
406
+
407
+
408
+ class ReportAgent(BaseAgent):
409
+ """Generates structured scientific reports from evidence and hypotheses."""
410
+
411
+ def __init__(
412
+ self,
413
+ evidence_store: dict[str, list[Evidence]],
414
+ embedding_service: "EmbeddingService | None" = None, # For diverse selection
415
+ ) -> None:
416
+ super().__init__(
417
+ name="ReportAgent",
418
+ description="Generates structured scientific research reports with citations",
419
+ )
420
+ self._evidence_store = evidence_store
421
+ self._embeddings = embedding_service
422
+ self._agent = Agent(
423
+ model=settings.llm_provider,
424
+ output_type=ResearchReport,
425
+ system_prompt=SYSTEM_PROMPT,
426
+ )
427
+
428
+ async def run(
429
+ self,
430
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
431
+ *,
432
+ thread: AgentThread | None = None,
433
+ **kwargs: Any,
434
+ ) -> AgentRunResponse:
435
+ """Generate research report."""
436
+ query = self._extract_query(messages)
437
+
438
+ # Gather all context
439
+ evidence = self._evidence_store.get("current", [])
440
+ hypotheses = self._evidence_store.get("hypotheses", [])
441
+ assessment = self._evidence_store.get("last_assessment", {})
442
+
443
+ if not evidence:
444
+ return AgentRunResponse(
445
+ messages=[ChatMessage(
446
+ role=Role.ASSISTANT,
447
+ text="Cannot generate report: No evidence collected."
448
+ )],
449
+ response_id="report-no-evidence",
450
+ )
451
+
452
+ # Build metadata
453
+ metadata = {
454
+ "sources": list(set(e.citation.source for e in evidence)),
455
+ "iterations": self._evidence_store.get("iteration_count", 0),
456
+ }
457
+
458
+ # Generate report (format_report_prompt is now async)
459
+ prompt = await format_report_prompt(
460
+ query=query,
461
+ evidence=evidence,
462
+ hypotheses=hypotheses,
463
+ assessment=assessment,
464
+ metadata=metadata,
465
+ embeddings=self._embeddings,
466
+ )
467
+
468
+ result = await self._agent.run(prompt)
469
+ report = result.output
470
+
471
+ # ═══════════════════════════════════════════════════════════════════
472
+ # 🚨 CRITICAL: Validate citations to prevent hallucination
473
+ # ═══════════════════════════════════════════════════════════════════
474
+ report = validate_references(report, evidence)
475
+
476
+ # Store validated report
477
+ self._evidence_store["final_report"] = report
478
+
479
+ # Return markdown version
480
+ return AgentRunResponse(
481
+ messages=[ChatMessage(role=Role.ASSISTANT, text=report.to_markdown())],
482
+ response_id="report-complete",
483
+ additional_properties={"report": report.model_dump()},
484
+ )
485
+
486
+ def _extract_query(self, messages) -> str:
487
+ """Extract query from messages."""
488
+ if isinstance(messages, str):
489
+ return messages
490
+ elif isinstance(messages, ChatMessage):
491
+ return messages.text or ""
492
+ elif isinstance(messages, list):
493
+ for msg in reversed(messages):
494
+ if isinstance(msg, ChatMessage) and msg.role == Role.USER:
495
+ return msg.text or ""
496
+ elif isinstance(msg, str):
497
+ return msg
498
+ return ""
499
+
500
+ async def run_stream(
501
+ self,
502
+ messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
503
+ *,
504
+ thread: AgentThread | None = None,
505
+ **kwargs: Any,
506
+ ) -> AsyncIterable[AgentRunResponseUpdate]:
507
+ """Streaming wrapper."""
508
+ result = await self.run(messages, thread=thread, **kwargs)
509
+ yield AgentRunResponseUpdate(
510
+ messages=result.messages,
511
+ response_id=result.response_id
512
+ )
513
+ ```
514
+
515
+ ### 4.3 Update MagenticOrchestrator
516
+
517
+ Add ReportAgent as the final synthesis step:
518
+
519
+ ```python
520
+ # In MagenticOrchestrator.__init__
521
+ self._report_agent = ReportAgent(self._evidence_store)
522
+
523
+ # In workflow building
524
+ workflow = (
525
+ MagenticBuilder()
526
+ .participants(
527
+ searcher=search_agent,
528
+ hypothesizer=hypothesis_agent,
529
+ judge=judge_agent,
530
+ reporter=self._report_agent, # NEW
531
+ )
532
+ .with_standard_manager(...)
533
+ .build()
534
+ )
535
+
536
+ # Update task instruction
537
+ task = f"""Research drug repurposing opportunities for: {query}
538
+
539
+ Workflow:
540
+ 1. SearchAgent: Find evidence from PubMed and web
541
+ 2. HypothesisAgent: Generate mechanistic hypotheses
542
+ 3. SearchAgent: Targeted search based on hypotheses
543
+ 4. JudgeAgent: Evaluate evidence sufficiency
544
+ 5. If sufficient β†’ ReportAgent: Generate structured research report
545
+ 6. If not sufficient β†’ Repeat from step 1 with refined queries
546
+
547
+ The final output should be a complete research report with:
548
+ - Executive summary
549
+ - Methodology
550
+ - Hypotheses tested
551
+ - Mechanistic and clinical findings
552
+ - Drug candidates
553
+ - Limitations
554
+ - Conclusion with references
555
+ """
556
+ ```
557
+
558
+ ---
559
+
560
+ ## 5. Directory Structure After Phase 8
561
+
562
+ ```
563
+ src/
564
+ β”œβ”€β”€ agents/
565
+ β”‚ β”œβ”€β”€ search_agent.py
566
+ β”‚ β”œβ”€β”€ judge_agent.py
567
+ β”‚ β”œβ”€β”€ hypothesis_agent.py
568
+ β”‚ └── report_agent.py # NEW
569
+ β”œβ”€β”€ prompts/
570
+ β”‚ β”œβ”€β”€ judge.py
571
+ β”‚ β”œβ”€β”€ hypothesis.py
572
+ β”‚ └── report.py # NEW
573
+ β”œβ”€β”€ services/
574
+ β”‚ └── embeddings.py
575
+ └── utils/
576
+ └── models.py # Updated with report models
577
+ ```
578
+
579
+ ---
580
+
581
+ ## 6. Tests
582
+
583
+ ### 6.1 Unit Tests (`tests/unit/agents/test_report_agent.py`)
584
+
585
+ ```python
586
+ """Unit tests for ReportAgent."""
587
+ import pytest
588
+ from unittest.mock import AsyncMock, MagicMock, patch
589
+
590
+ from src.agents.report_agent import ReportAgent
591
+ from src.utils.models import (
592
+ Citation, Evidence, MechanismHypothesis,
593
+ ResearchReport, ReportSection
594
+ )
595
+
596
+
597
+ @pytest.fixture
598
+ def sample_evidence():
599
+ return [
600
+ Evidence(
601
+ content="Metformin activates AMPK...",
602
+ citation=Citation(
603
+ source="pubmed",
604
+ title="Metformin mechanisms",
605
+ url="https://pubmed.ncbi.nlm.nih.gov/12345/",
606
+ date="2023",
607
+ authors=["Smith J", "Jones A"]
608
+ )
609
+ )
610
+ ]
611
+
612
+
613
+ @pytest.fixture
614
+ def sample_hypotheses():
615
+ return [
616
+ MechanismHypothesis(
617
+ drug="Metformin",
618
+ target="AMPK",
619
+ pathway="mTOR inhibition",
620
+ effect="Neuroprotection",
621
+ confidence=0.8,
622
+ search_suggestions=[]
623
+ )
624
+ ]
625
+
626
+
627
+ @pytest.fixture
628
+ def mock_report():
629
+ return ResearchReport(
630
+ title="Drug Repurposing Analysis: Metformin for Alzheimer's",
631
+ executive_summary="This report analyzes metformin as a potential...",
632
+ research_question="Can metformin be repurposed for Alzheimer's disease?",
633
+ methodology=ReportSection(
634
+ title="Methodology",
635
+ content="Searched PubMed and web sources..."
636
+ ),
637
+ hypotheses_tested=[
638
+ {"mechanism": "Metformin β†’ AMPK β†’ neuroprotection", "supported": 5, "contradicted": 1}
639
+ ],
640
+ mechanistic_findings=ReportSection(
641
+ title="Mechanistic Findings",
642
+ content="Evidence suggests AMPK activation..."
643
+ ),
644
+ clinical_findings=ReportSection(
645
+ title="Clinical Findings",
646
+ content="Limited clinical data available..."
647
+ ),
648
+ drug_candidates=["Metformin"],
649
+ limitations=["Abstract-level analysis only"],
650
+ conclusion="Metformin shows promise...",
651
+ references=[],
652
+ sources_searched=["pubmed", "web"],
653
+ total_papers_reviewed=10,
654
+ search_iterations=3,
655
+ confidence_score=0.75
656
+ )
657
+
658
+
659
+ @pytest.mark.asyncio
660
+ async def test_report_agent_generates_report(
661
+ sample_evidence, sample_hypotheses, mock_report
662
+ ):
663
+ """ReportAgent should generate structured report."""
664
+ store = {
665
+ "current": sample_evidence,
666
+ "hypotheses": sample_hypotheses,
667
+ "last_assessment": {"mechanism_score": 8, "clinical_score": 6}
668
+ }
669
+
670
+ with patch("src.agents.report_agent.Agent") as MockAgent:
671
+ mock_result = MagicMock()
672
+ mock_result.output = mock_report
673
+ MockAgent.return_value.run = AsyncMock(return_value=mock_result)
674
+
675
+ agent = ReportAgent(store)
676
+ response = await agent.run("metformin alzheimer")
677
+
678
+ assert "Executive Summary" in response.messages[0].text
679
+ assert "Methodology" in response.messages[0].text
680
+ assert "References" in response.messages[0].text
681
+
682
+
683
+ @pytest.mark.asyncio
684
+ async def test_report_agent_no_evidence():
685
+ """ReportAgent should handle empty evidence gracefully."""
686
+ store = {"current": [], "hypotheses": []}
687
+ agent = ReportAgent(store)
688
+
689
+ response = await agent.run("test query")
690
+
691
+ assert "Cannot generate report" in response.messages[0].text
692
+
693
+
694
+ # ═══════════════════════════════════════════════════════════════════════════
695
+ # 🚨 CRITICAL: Citation Validation Tests
696
+ # ═══════════════════════════════════════════════════════════════════════════
697
+
698
+ @pytest.mark.asyncio
699
+ async def test_report_agent_removes_hallucinated_citations(sample_evidence):
700
+ """ReportAgent should remove citations not in evidence."""
701
+ from src.utils.citation_validator import validate_references
702
+
703
+ # Create report with mix of valid and hallucinated references
704
+ report_with_hallucinations = ResearchReport(
705
+ title="Test Report",
706
+ executive_summary="This is a test report for citation validation...",
707
+ research_question="Testing citation validation",
708
+ methodology=ReportSection(title="Methodology", content="Test"),
709
+ hypotheses_tested=[],
710
+ mechanistic_findings=ReportSection(title="Mechanistic", content="Test"),
711
+ clinical_findings=ReportSection(title="Clinical", content="Test"),
712
+ drug_candidates=["TestDrug"],
713
+ limitations=["Test limitation"],
714
+ conclusion="Test conclusion",
715
+ references=[
716
+ # Valid reference (matches sample_evidence)
717
+ {
718
+ "title": "Metformin mechanisms",
719
+ "url": "https://pubmed.ncbi.nlm.nih.gov/12345/",
720
+ "authors": ["Smith J", "Jones A"],
721
+ "date": "2023",
722
+ "source": "pubmed"
723
+ },
724
+ # HALLUCINATED reference (URL doesn't exist in evidence)
725
+ {
726
+ "title": "Fake Paper That Doesn't Exist",
727
+ "url": "https://fake-journal.com/made-up-paper",
728
+ "authors": ["Hallucinated A"],
729
+ "date": "2024",
730
+ "source": "fake"
731
+ },
732
+ # Another HALLUCINATED reference
733
+ {
734
+ "title": "Invented Research",
735
+ "url": "https://pubmed.ncbi.nlm.nih.gov/99999999/",
736
+ "authors": ["NotReal B"],
737
+ "date": "2025",
738
+ "source": "pubmed"
739
+ }
740
+ ],
741
+ sources_searched=["pubmed"],
742
+ total_papers_reviewed=1,
743
+ search_iterations=1,
744
+ confidence_score=0.5
745
+ )
746
+
747
+ # Validate - should remove hallucinated references
748
+ validated_report = validate_references(report_with_hallucinations, sample_evidence)
749
+
750
+ # Only the valid reference should remain
751
+ assert len(validated_report.references) == 1
752
+ assert validated_report.references[0]["title"] == "Metformin mechanisms"
753
+ assert "Fake Paper" not in str(validated_report.references)
754
+
755
+
756
+ def test_citation_validator_handles_empty_references():
757
+ """Citation validator should handle reports with no references."""
758
+ from src.utils.citation_validator import validate_references
759
+
760
+ report = ResearchReport(
761
+ title="Empty Refs Report",
762
+ executive_summary="This report has no references...",
763
+ research_question="Testing empty refs",
764
+ methodology=ReportSection(title="Methodology", content="Test"),
765
+ hypotheses_tested=[],
766
+ mechanistic_findings=ReportSection(title="Mechanistic", content="Test"),
767
+ clinical_findings=ReportSection(title="Clinical", content="Test"),
768
+ drug_candidates=[],
769
+ limitations=[],
770
+ conclusion="Test",
771
+ references=[], # Empty!
772
+ sources_searched=[],
773
+ total_papers_reviewed=0,
774
+ search_iterations=0,
775
+ confidence_score=0.0
776
+ )
777
+
778
+ validated = validate_references(report, [])
779
+ assert validated.references == []
780
+ ```
781
+
782
+ ---
783
+
784
+ ## 7. Definition of Done
785
+
786
+ Phase 8 is **COMPLETE** when:
787
+
788
+ 1. `ResearchReport` model implemented with all sections
789
+ 2. `ReportAgent` generates structured reports
790
+ 3. Reports include proper citations and methodology
791
+ 4. Magentic workflow uses ReportAgent for final synthesis
792
+ 5. Report renders as clean markdown
793
+ 6. All unit tests pass
794
+
795
+ ---
796
+
797
+ ## 8. Value Delivered
798
+
799
+ | Before (Phase 7) | After (Phase 8) |
800
+ |------------------|-----------------|
801
+ | Basic synthesis | Structured scientific report |
802
+ | Simple bullet points | Executive summary + methodology |
803
+ | List of citations | Formatted references |
804
+ | No methodology | Clear research process |
805
+ | No limitations | Honest limitations section |
806
+
807
+ **Sample output comparison:**
808
+
809
+ Before:
810
+ ```
811
+ ## Analysis
812
+ - Metformin might help
813
+ - Found 5 papers
814
+ [Link 1] [Link 2]
815
+ ```
816
+
817
+ After:
818
+ ```
819
+ # Drug Repurposing Analysis: Metformin for Alzheimer's Disease
820
+
821
+ ## Executive Summary
822
+ Analysis of 15 papers suggests metformin may provide neuroprotection
823
+ through AMPK activation. Mechanistic evidence is strong (8/10),
824
+ while clinical evidence is moderate (6/10)...
825
+
826
+ ## Methodology
827
+ Systematic search of PubMed and web sources using queries...
828
+
829
+ ## Hypotheses Tested
830
+ - βœ… Metformin β†’ AMPK β†’ neuroprotection (7 supporting, 2 contradicting)
831
+
832
+ ## References
833
+ 1. Smith J, Jones A. *Metformin mechanisms*. Nature (2023). [Link](...)
834
+ ```
835
+
836
+ ---
837
+
838
+ ## 9. Complete Magentic Architecture (Phases 5-8)
839
+
840
+ ```
841
+ User Query
842
+ ↓
843
+ Gradio UI
844
+ ↓
845
+ Magentic Manager (LLM Coordinator)
846
+ β”œβ”€β”€ SearchAgent ←→ PubMed + Web + VectorDB
847
+ β”œβ”€β”€ HypothesisAgent ←→ Mechanistic Reasoning
848
+ β”œβ”€β”€ JudgeAgent ←→ Evidence Assessment
849
+ └── ReportAgent ←→ Final Synthesis
850
+ ↓
851
+ Structured Research Report
852
+ ```
853
+
854
+ **This matches Mario's diagram** with the practical agents that add real value for drug repurposing research.
docs/implementation/roadmap.md CHANGED
@@ -115,26 +115,96 @@ tests/
115
 
116
  ---
117
 
118
- ### **Phase 5: Magentic Integration (OPTIONAL - Post-MVP)**
119
 
120
  *Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.*
121
 
122
- - [ ] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance.
123
- - [ ] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance.
124
- - [ ] Implement `MagenticOrchestrator` using `MagenticBuilder`.
125
- - [ ] Create factory pattern for switching implementations.
126
  - **Deliverable**: Same API, better multi-agent orchestration engine.
127
 
128
- **NOTE**: Only implement Phase 5 if time permits after MVP is shipped.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
  ---
131
 
132
  ## Spec Documents
133
 
134
- 1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)**
135
- 2. **[Phase 2 Spec: Search Slice](02_phase_search.md)**
136
- 3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)**
137
- 4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)**
138
- 5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** *(Optional)*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
 
140
- *Start by reading Phase 1 Spec to initialize the repo.*
 
115
 
116
  ---
117
 
118
+ ### **Phase 5: Magentic Integration** βœ… COMPLETE
119
 
120
  *Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.*
121
 
122
+ - [x] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance.
123
+ - [x] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance.
124
+ - [x] Implement `MagenticOrchestrator` using `MagenticBuilder`.
125
+ - [x] Create factory pattern for switching implementations.
126
  - **Deliverable**: Same API, better multi-agent orchestration engine.
127
 
128
+ ---
129
+
130
+ ### **Phase 6: Embeddings & Semantic Search**
131
+
132
+ *Goal: Add vector search for semantic evidence retrieval.*
133
+
134
+ - [ ] Implement `EmbeddingService` with ChromaDB.
135
+ - [ ] Add semantic deduplication to SearchAgent.
136
+ - [ ] Enable semantic search for related evidence.
137
+ - [ ] Store embeddings in shared context.
138
+ - **Deliverable**: Find semantically related papers, not just keyword matches.
139
+
140
+ ---
141
+
142
+ ### **Phase 7: Hypothesis Agent**
143
+
144
+ *Goal: Generate scientific hypotheses to guide targeted searches.*
145
+
146
+ - [ ] Implement `MechanismHypothesis` and `HypothesisAssessment` models.
147
+ - [ ] Implement `HypothesisAgent` for mechanistic reasoning.
148
+ - [ ] Add hypothesis-driven search queries.
149
+ - [ ] Integrate into Magentic workflow.
150
+ - **Deliverable**: Drug β†’ Target β†’ Pathway β†’ Effect hypotheses that guide research.
151
+
152
+ ---
153
+
154
+ ### **Phase 8: Report Agent**
155
+
156
+ *Goal: Generate structured scientific reports with proper citations.*
157
+
158
+ - [ ] Implement `ResearchReport` model with all sections.
159
+ - [ ] Implement `ReportAgent` for synthesis.
160
+ - [ ] Include methodology, limitations, formatted references.
161
+ - [ ] Integrate as final synthesis step in Magentic workflow.
162
+ - **Deliverable**: Publication-quality research reports.
163
+
164
+ ---
165
+
166
+ ## Complete Architecture (Phases 1-8)
167
+
168
+ ```
169
+ User Query
170
+ ↓
171
+ Gradio UI (Phase 4)
172
+ ↓
173
+ Magentic Manager (Phase 5)
174
+ β”œβ”€β”€ SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
175
+ β”œβ”€β”€ HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
176
+ β”œβ”€β”€ JudgeAgent (Phase 3+5) ←→ Evidence Assessment
177
+ └── ReportAgent (Phase 8) ←→ Final Synthesis
178
+ ↓
179
+ Structured Research Report
180
+ ```
181
 
182
  ---
183
 
184
  ## Spec Documents
185
 
186
+ 1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** βœ…
187
+ 2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** βœ…
188
+ 3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** βœ…
189
+ 4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)** βœ…
190
+ 5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** βœ…
191
+ 6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)**
192
+ 7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)**
193
+ 8. **[Phase 8 Spec: Report Agent](08_phase_report.md)**
194
+
195
+ ---
196
+
197
+ ## Progress Summary
198
+
199
+ | Phase | Status | Deliverable |
200
+ |-------|--------|-------------|
201
+ | Phase 1: Foundation | βœ… COMPLETE | CI-ready repo with uv/pytest |
202
+ | Phase 2: Search | βœ… COMPLETE | PubMed + Web search |
203
+ | Phase 3: Judge | βœ… COMPLETE | LLM evidence assessment |
204
+ | Phase 4: UI & Loop | βœ… COMPLETE | Working Gradio app |
205
+ | Phase 5: Magentic | βœ… COMPLETE | Multi-agent orchestration |
206
+ | Phase 6: Embeddings | πŸ“ SPEC READY | Semantic search |
207
+ | Phase 7: Hypothesis | πŸ“ SPEC READY | Mechanistic reasoning |
208
+ | Phase 8: Report | πŸ“ SPEC READY | Structured reports |
209
 
210
+ *Phases 1-5 completed in ONE DAY. Phases 6-8 specs ready for implementation.*