VibecoderMcSwaggins commited on
Commit
19e1e93
Β·
1 Parent(s): dbf888c

docs: Add SPEC_16 Unified Chat Client Architecture

Browse files

Proposes eliminating Simple/Advanced parallel universe by implementing
a pluggable ChatClient factory:

- HuggingFaceChatClient (~200 lines) for free tier
- Single multi-agent codebase for all providers
- Delete ~1,100 lines (simple.py, handlers)
- Config toggle for provider switching

Full stack analysis:
- 5 core files need changes (agents, orchestrator)
- Embeddings unchanged (already has free/premium tiers)
- Simple Mode files deleted after migration

Unblocks Issue #105 (Deprecate Simple Mode)

docs/specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md ADDED
@@ -0,0 +1,427 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC_16: Unified Chat Client Architecture
2
+
3
+ **Status**: Proposed
4
+ **Priority**: P1 (Architectural Simplification)
5
+ **Issue**: Updates [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105)
6
+ **Created**: 2025-12-01
7
+
8
+ ## Summary
9
+
10
+ Eliminate the Simple Mode / Advanced Mode parallel universe by implementing a pluggable `ChatClient` architecture. This allows the multi-agent framework to work with ANY LLM provider (OpenAI, HuggingFace, Anthropic, etc.) through a single, unified codebase.
11
+
12
+ ## Problem Statement
13
+
14
+ ### Current Architecture: Two Parallel Universes
15
+
16
+ ```
17
+ User Query
18
+ β”‚
19
+ β”œβ”€β”€ Has API Key? ──Yes──→ Advanced Mode (400 lines)
20
+ β”‚ └── Microsoft Agent Framework
21
+ β”‚ └── OpenAIChatClient (hardcoded)
22
+ β”‚
23
+ └── No API Key? ──────────→ Simple Mode (761 lines)
24
+ └── While-loop orchestration
25
+ └── Pydantic AI + HuggingFace
26
+ ```
27
+
28
+ **Problems:**
29
+ 1. **Double Maintenance**: 1,161 lines across two systems
30
+ 2. **Feature Drift**: New features must be implemented twice
31
+ 3. **Bug Duplication**: Same bugs appear in both systems
32
+ 4. **Testing Burden**: Two test suites, two CI paths
33
+ 5. **Cognitive Load**: Developers must understand both patterns
34
+
35
+ ### Root Cause Analysis
36
+
37
+ The issue #105 stated: "Microsoft Agent Framework's OpenAIChatClient only speaks OpenAI API format."
38
+
39
+ **This is FALSE.** Upon investigation:
40
+
41
+ ```python
42
+ # Microsoft Agent Framework provides:
43
+ from agent_framework import BaseChatClient, ChatClientProtocol
44
+
45
+ # Abstract methods to implement:
46
+ frozenset({'_inner_get_response', '_inner_get_streaming_response'})
47
+ ```
48
+
49
+ The framework IS designed for pluggable clients. We just never implemented alternatives.
50
+
51
+ ## Proposed Solution: ChatClientFactory
52
+
53
+ ### Architecture After Implementation
54
+
55
+ ```
56
+ User Query
57
+ β”‚
58
+ └──→ Advanced Mode (unified)
59
+ └── Microsoft Agent Framework
60
+ └── ChatClientFactory:
61
+ β”œβ”€β”€ OpenAIChatClient (API key present)
62
+ β”œβ”€β”€ AnthropicChatClient (Anthropic key)
63
+ └── HuggingFaceChatClient (free fallback)
64
+ ```
65
+
66
+ ### New Files
67
+
68
+ ```
69
+ src/
70
+ β”œβ”€β”€ clients/
71
+ β”‚ β”œβ”€β”€ __init__.py
72
+ β”‚ β”œβ”€β”€ base.py # Re-export BaseChatClient
73
+ β”‚ β”œβ”€β”€ factory.py # ChatClientFactory
74
+ β”‚ β”œβ”€β”€ huggingface.py # HuggingFaceChatClient (~200 lines)
75
+ β”‚ └── anthropic.py # AnthropicChatClient (~200 lines) [future]
76
+ ```
77
+
78
+ ### ChatClientFactory Implementation
79
+
80
+ ```python
81
+ # src/clients/factory.py
82
+ from agent_framework import BaseChatClient
83
+ from agent_framework.openai import OpenAIChatClient
84
+
85
+ from src.utils.config import settings
86
+
87
+ def get_chat_client(
88
+ provider: str | None = None,
89
+ api_key: str | None = None,
90
+ ) -> BaseChatClient:
91
+ """
92
+ Factory for creating chat clients.
93
+
94
+ Auto-detection priority:
95
+ 1. Explicit provider parameter
96
+ 2. OpenAI key (highest quality)
97
+ 3. Anthropic key
98
+ 4. HuggingFace (free fallback)
99
+
100
+ Args:
101
+ provider: Force specific provider ("openai", "anthropic", "huggingface")
102
+ api_key: Override API key for the provider
103
+
104
+ Returns:
105
+ Configured BaseChatClient instance
106
+ """
107
+ if provider == "openai" or (provider is None and settings.has_openai_key):
108
+ return OpenAIChatClient(
109
+ model_id=settings.openai_model,
110
+ api_key=api_key or settings.openai_api_key,
111
+ )
112
+
113
+ if provider == "anthropic" or (provider is None and settings.has_anthropic_key):
114
+ from src.clients.anthropic import AnthropicChatClient
115
+ return AnthropicChatClient(
116
+ model_id=settings.anthropic_model,
117
+ api_key=api_key or settings.anthropic_api_key,
118
+ )
119
+
120
+ # Free fallback
121
+ from src.clients.huggingface import HuggingFaceChatClient
122
+ return HuggingFaceChatClient(
123
+ model_id="meta-llama/Llama-3.1-70B-Instruct",
124
+ )
125
+ ```
126
+
127
+ ### HuggingFaceChatClient Implementation
128
+
129
+ ```python
130
+ # src/clients/huggingface.py
131
+ from collections.abc import AsyncIterable
132
+ from typing import Any
133
+
134
+ from agent_framework import (
135
+ BaseChatClient,
136
+ ChatMessage,
137
+ ChatResponse,
138
+ ChatResponseUpdate,
139
+ TextContent,
140
+ FunctionCallContent,
141
+ )
142
+ from huggingface_hub import InferenceClient
143
+
144
+ class HuggingFaceChatClient(BaseChatClient):
145
+ """
146
+ HuggingFace Inference adapter for Microsoft Agent Framework.
147
+
148
+ Enables multi-agent orchestration using free HuggingFace models
149
+ like Llama 3.1 70B Instruct (supports function calling).
150
+ """
151
+
152
+ def __init__(
153
+ self,
154
+ model_id: str = "meta-llama/Llama-3.1-70B-Instruct",
155
+ api_key: str | None = None,
156
+ ):
157
+ self._model_id = model_id
158
+ self._client = InferenceClient(model=model_id, token=api_key)
159
+
160
+ def service_url(self) -> str:
161
+ return "https://api-inference.huggingface.co"
162
+
163
+ async def _inner_get_response(
164
+ self,
165
+ messages: list[ChatMessage],
166
+ **kwargs: Any,
167
+ ) -> ChatResponse:
168
+ """Convert and call HuggingFace, return ChatResponse."""
169
+ # Convert ChatMessage[] to HuggingFace format
170
+ hf_messages = self._convert_messages_to_hf(messages)
171
+
172
+ # Handle tools/function calling if present
173
+ tools = kwargs.get("tools")
174
+ hf_tools = self._convert_tools_to_hf(tools) if tools else None
175
+
176
+ # Call HuggingFace API
177
+ response = await self._client.chat_completion(
178
+ messages=hf_messages,
179
+ tools=hf_tools,
180
+ max_tokens=kwargs.get("max_tokens", 4096),
181
+ temperature=kwargs.get("temperature", 0.7),
182
+ )
183
+
184
+ # Convert response back to ChatResponse
185
+ return self._convert_response_from_hf(response)
186
+
187
+ async def _inner_get_streaming_response(
188
+ self,
189
+ messages: list[ChatMessage],
190
+ **kwargs: Any,
191
+ ) -> AsyncIterable[ChatResponseUpdate]:
192
+ """Streaming version of response generation."""
193
+ hf_messages = self._convert_messages_to_hf(messages)
194
+
195
+ async for chunk in self._client.chat_completion(
196
+ messages=hf_messages,
197
+ stream=True,
198
+ **kwargs,
199
+ ):
200
+ yield self._convert_chunk_from_hf(chunk)
201
+
202
+ def _convert_messages_to_hf(self, messages: list[ChatMessage]) -> list[dict]:
203
+ """Convert Agent Framework messages to HuggingFace format."""
204
+ result = []
205
+ for msg in messages:
206
+ hf_msg = {"role": msg.role.value}
207
+
208
+ # Extract text content
209
+ if msg.text:
210
+ hf_msg["content"] = str(msg.text)
211
+ elif msg.contents:
212
+ # Handle multi-part content
213
+ hf_msg["content"] = " ".join(
214
+ str(c.text) for c in msg.contents
215
+ if hasattr(c, "text")
216
+ )
217
+
218
+ # Handle function calls
219
+ if any(isinstance(c, FunctionCallContent) for c in (msg.contents or [])):
220
+ hf_msg["tool_calls"] = [
221
+ self._convert_function_call(c)
222
+ for c in msg.contents
223
+ if isinstance(c, FunctionCallContent)
224
+ ]
225
+
226
+ result.append(hf_msg)
227
+ return result
228
+
229
+ def _convert_tools_to_hf(self, tools) -> list[dict] | None:
230
+ """Convert Agent Framework tools to HuggingFace format."""
231
+ if not tools:
232
+ return None
233
+
234
+ hf_tools = []
235
+ for tool in tools:
236
+ if hasattr(tool, "to_dict"):
237
+ # ToolProtocol objects
238
+ hf_tools.append({
239
+ "type": "function",
240
+ "function": tool.to_dict(),
241
+ })
242
+ elif callable(tool):
243
+ # ai_function decorated functions
244
+ hf_tools.append({
245
+ "type": "function",
246
+ "function": {
247
+ "name": tool.__name__,
248
+ "description": tool.__doc__ or "",
249
+ "parameters": getattr(tool, "__schema__", {}),
250
+ }
251
+ })
252
+ return hf_tools or None
253
+
254
+ def _convert_response_from_hf(self, response) -> ChatResponse:
255
+ """Convert HuggingFace response to ChatResponse."""
256
+ choice = response.choices[0]
257
+ message = choice.message
258
+
259
+ contents = []
260
+
261
+ # Text content
262
+ if message.content:
263
+ contents.append(TextContent(text=message.content))
264
+
265
+ # Function/tool calls
266
+ if message.tool_calls:
267
+ for tc in message.tool_calls:
268
+ contents.append(FunctionCallContent(
269
+ call_id=tc.id,
270
+ name=tc.function.name,
271
+ arguments=tc.function.arguments,
272
+ ))
273
+
274
+ return ChatResponse(
275
+ text=message.content,
276
+ model_id=self._model_id,
277
+ finish_reason={"type": choice.finish_reason},
278
+ )
279
+ ```
280
+
281
+ ### Changes to Advanced Orchestrator
282
+
283
+ ```python
284
+ # src/orchestrators/advanced.py
285
+
286
+ # BEFORE (hardcoded):
287
+ from agent_framework.openai import OpenAIChatClient
288
+
289
+ class AdvancedOrchestrator:
290
+ def __init__(self, ...):
291
+ self._chat_client = OpenAIChatClient(...)
292
+
293
+ # AFTER (factory):
294
+ from src.clients.factory import get_chat_client
295
+
296
+ class AdvancedOrchestrator:
297
+ def __init__(self, chat_client=None, provider=None, api_key=None, ...):
298
+ self._chat_client = chat_client or get_chat_client(
299
+ provider=provider,
300
+ api_key=api_key,
301
+ )
302
+ ```
303
+
304
+ ## Files to Delete After Implementation
305
+
306
+ | File | Lines | Reason |
307
+ |------|-------|--------|
308
+ | `src/orchestrators/simple.py` | 761 | Replaced by unified Advanced Mode |
309
+ | `src/tools/search_handler.py` | ~150 | Manager agent handles orchestration |
310
+ | `src/agent_factory/judges.py` (JudgeHandler) | ~200 | JudgeAgent replaces this |
311
+
312
+ **Total deletion: ~1,100 lines**
313
+ **Total addition: ~400 lines (new clients)**
314
+ **Net: -700 lines, single architecture**
315
+
316
+ ## Migration Plan
317
+
318
+ ### Phase 1: Implement HuggingFaceChatClient
319
+ - [ ] Create `src/clients/` package
320
+ - [ ] Implement `HuggingFaceChatClient` with function calling
321
+ - [ ] Write unit tests for message/tool conversion
322
+ - [ ] Test with simple queries (no multi-agent)
323
+
324
+ ### Phase 2: Integrate into Advanced Mode
325
+ - [ ] Create `ChatClientFactory`
326
+ - [ ] Update `AdvancedOrchestrator` to use factory
327
+ - [ ] Update `magentic_agents.py` to accept any `BaseChatClient`
328
+ - [ ] Test full multi-agent flow with HuggingFace
329
+
330
+ ### Phase 3: Deprecate Simple Mode
331
+ - [ ] Add deprecation warning to Simple Mode
332
+ - [ ] Update factory.py to only return AdvancedOrchestrator
333
+ - [ ] Update UI to remove mode selection (auto-detect only)
334
+ - [ ] Run full regression tests
335
+
336
+ ### Phase 4: Remove Simple Mode
337
+ - [ ] Delete `simple.py`
338
+ - [ ] Delete `search_handler.py`
339
+ - [ ] Remove JudgeHandler classes
340
+ - [ ] Archive to `docs/archive/` for reference
341
+ - [ ] Update all tests
342
+
343
+ ## Risks and Mitigations
344
+
345
+ ### Risk 1: HuggingFace Rate Limits
346
+ **Problem**: Free tier may throttle multi-agent flows (5-10 LLM calls per query)
347
+ **Mitigation**:
348
+ - Add exponential backoff with retries
349
+ - Cache manager decisions where possible
350
+ - Consider paid HF Pro ($9/month) for demo
351
+
352
+ ### Risk 2: Function Calling Quality
353
+ **Problem**: Llama 3.1 70B function calling may be less reliable than GPT-5
354
+ **Mitigation**:
355
+ - Add validation/retry on malformed tool calls
356
+ - Fall back to text parsing if JSON fails
357
+ - Test extensively before removing Simple Mode
358
+
359
+ ### Risk 3: Response Format Differences
360
+ **Problem**: HuggingFace responses may have subtle format differences
361
+ **Mitigation**:
362
+ - Comprehensive conversion functions
363
+ - Unit tests covering edge cases
364
+ - Integration tests with real API
365
+
366
+ ## Success Criteria
367
+
368
+ 1. **Single Codebase**: No more Simple/Advanced split
369
+ 2. **Zero API Key Demo**: HuggingFace Spaces works without user API key
370
+ 3. **Quality Parity**: Free tier produces comparable research reports
371
+ 4. **Maintainability**: One test suite, one bug tracker, one feature path
372
+
373
+ ## Full Stack Analysis
374
+
375
+ ### Files Requiring Changes (Category 1: Core)
376
+
377
+ | File | Refs | Change |
378
+ |------|------|--------|
379
+ | `src/orchestrators/advanced.py` | 8 | `OpenAIChatClient` β†’ `get_chat_client()` |
380
+ | `src/agents/magentic_agents.py` | 12 | Type: `OpenAIChatClient` β†’ `BaseChatClient` |
381
+ | `src/agents/retrieval_agent.py` | 4 | Same pattern |
382
+ | `src/agents/code_executor_agent.py` | 4 | Same pattern |
383
+ | `src/utils/llm_factory.py` | 8 | Merge into `clients/factory.py` |
384
+
385
+ ### Files to Delete (Category 2: Simple Mode)
386
+
387
+ | File | Lines | Reason |
388
+ |------|-------|--------|
389
+ | `src/orchestrators/simple.py` | 761 | Replaced by unified system |
390
+ | `src/agent_factory/judges.py` (handlers) | ~200 | JudgeAgent replaces |
391
+ | `src/tools/search_handler.py` | ~150 | Manager agent replaces |
392
+
393
+ ### Files Unchanged (Category 3: Embeddings)
394
+
395
+ Embedding services are a **separate concern**:
396
+ - `src/services/llamaindex_rag.py` - Premium tier (OpenAI embeddings)
397
+ - `src/services/embeddings.py` - Free tier (local sentence-transformers)
398
+
399
+ Both work today. No changes needed.
400
+
401
+ ### Config Toggle (Future Enhancement)
402
+
403
+ After implementation, providers can be toggled via config:
404
+
405
+ ```bash
406
+ # .env
407
+ CHAT_PROVIDER=huggingface # "openai", "anthropic", "huggingface", "auto"
408
+ ```
409
+
410
+ Or at runtime:
411
+ ```python
412
+ orchestrator = AdvancedOrchestrator(provider="huggingface")
413
+ orchestrator = AdvancedOrchestrator(provider="openai", api_key="sk-...")
414
+ ```
415
+
416
+ This enables:
417
+ 1. **A/B testing** different providers
418
+ 2. **Cost optimization** (switch to cheaper provider)
419
+ 3. **Graceful degradation** (fallback chain)
420
+ 4. **Kill switch** (disable specific provider)
421
+
422
+ ## References
423
+
424
+ - Microsoft Agent Framework: `agent_framework.BaseChatClient`
425
+ - HuggingFace Inference: `huggingface_hub.InferenceClient`
426
+ - Llama 3.1 Function Calling: [HuggingFace Docs](https://huggingface.co/docs/transformers/main/chat_templating#tool-use--function-calling)
427
+ - Issue #105: Deprecate Simple Mode