VibecoderMcSwaggins commited on
Commit
7e1184a
Β·
1 Parent(s): 65d200f

docs: P1 bug for uninterpretable chain-of-thought events (#106)

Browse files

Documented issue where Advanced Mode exposes raw internal framework
events from agent-framework-core to users:
- Manager (user_task), (task_ledger), (instruction) are internal
- Hard truncation at 200 chars makes messages uninterpretable
- All mapped to "judging" type incorrectly

Root cause: _process_event() in advanced.py doesn't filter or
transform MagenticOrchestratorMessageEvent events.

Fixes: Filter internal events or transform to user-friendly messages.

docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -1,6 +1,6 @@
1
  # Active Bugs
2
 
3
- > Last updated: 2025-12-01 (01:00 PST)
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
@@ -13,6 +13,29 @@ _No active P0 bugs._
13
 
14
  ## P1 - Important
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ### P1 - Memory Layer Not Integrated (Post-Hackathon)
17
  **Issue:** [#73](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/73)
18
  **Spec:** [SPEC_08_INTEGRATE_MEMORY_LAYER.md](../specs/SPEC_08_INTEGRATE_MEMORY_LAYER.md)
 
1
  # Active Bugs
2
 
3
+ > Last updated: 2025-12-01 (02:50 PST)
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
 
13
 
14
  ## P1 - Important
15
 
16
+ ### P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought
17
+ **Issue:** [#106](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/106)
18
+ **File:** [P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md](P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md)
19
+ **Found:** 2025-12-01 (Manual Testing)
20
+
21
+ **Problem:** Advanced orchestrator exposes raw internal framework events to users:
22
+ - `Manager (user_task): Research sexual health and wellness interventions for...`
23
+ - `Manager (task_ledger): We are working to address...`
24
+ - `Manager (instruction): Conduct targeted searches on PubMed...`
25
+
26
+ These are framework-internal bookkeeping truncated at 200 chars, making them uninterpretable.
27
+
28
+ **Root Cause:** `_process_event()` in `advanced.py` doesn't filter or transform `MagenticOrchestratorMessageEvent` events from `agent-framework-core`.
29
+
30
+ **Solution Options:**
31
+ 1. Filter internal events (`user_task`, `task_ledger`, `instruction`)
32
+ 2. Transform to user-friendly messages ("Manager assigning search task...")
33
+ 3. Add verbose mode for debugging
34
+
35
+ **Status:** Open
36
+
37
+ ---
38
+
39
  ### P1 - Memory Layer Not Integrated (Post-Hackathon)
40
  **Issue:** [#73](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/73)
41
  **Spec:** [SPEC_08_INTEGRATE_MEMORY_LAYER.md](../specs/SPEC_08_INTEGRATE_MEMORY_LAYER.md)
docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1: Advanced Mode Exposes Uninterpretable Chain-of-Thought Events
2
+
3
+ **Priority**: P1 (UX Degradation)
4
+ **Component**: `src/orchestrators/advanced.py`
5
+ **Status**: Open
6
+ **Issue**: [#106](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/106)
7
+ **Created**: 2025-12-01
8
+
9
+ ## Summary
10
+
11
+ The Advanced orchestrator exposes raw internal framework events from `agent-framework-core` directly to users. These events contain internal manager bookkeeping (task assignments, ledgers, instructions) that are:
12
+
13
+ 1. Truncated mid-sentence at 200 characters
14
+ 2. Use internal framework terminology (`user_task`, `task_ledger`, `instruction`)
15
+ 3. Shown with misleading "JUDGING" event type
16
+ 4. Not meaningful to end users
17
+
18
+ ## Example of Bad Output
19
+
20
+ ```
21
+ 🧠 **JUDGING**: Manager (user_task): Research sexual health and wellness interventions for: sildenafil mechanism ##...
22
+
23
+ 🧠 **JUDGING**: Manager (task_ledger): We are working to address the following user request: Research sexual healt...
24
+
25
+ 🧠 **JUDGING**: Manager (instruction): Conduct targeted searches on PubMed, ClinicalTrials.gov, and Europe PMC to ga...
26
+ ```
27
+
28
+ Users see:
29
+ - Raw internal prompts being passed between manager and agents
30
+ - Truncated text that cuts off mid-word ("healt...", "ga...")
31
+ - Technical jargon ("task_ledger") with no context
32
+ - All events labeled as "JUDGING" even when they're task assignments
33
+
34
+ ## Root Cause Analysis
35
+
36
+ ### The Chain of Issues
37
+
38
+ | Location | Issue |
39
+ |----------|-------|
40
+ | `src/orchestrators/advanced.py:363-370` | `MagenticOrchestratorMessageEvent` raw events exposed without filtering |
41
+ | `src/orchestrators/advanced.py:368` | `event.kind` values (`user_task`, `task_ledger`, `instruction`) are internal framework concepts |
42
+ | `src/orchestrators/advanced.py:368` | Hard truncation: `text[:200]...` breaks mid-sentence |
43
+ | `src/orchestrators/advanced.py:367` | All manager events mapped to `type="judging"` regardless of actual purpose |
44
+ | `src/orchestrators/advanced.py:380` | Agent messages also truncated at 200 chars |
45
+ | `src/utils/models.py:136` | `"judging": "🧠"` icon shown for all these internal events |
46
+ | `src/app.py:248` | Events displayed verbatim via `event.to_markdown()` |
47
+
48
+ ### Code Path
49
+
50
+ ```
51
+ agent-framework-core (Microsoft)
52
+ ↓
53
+ MagenticOrchestratorMessageEvent(kind="task_ledger", message="...")
54
+ ↓
55
+ advanced.py:_process_event() - NO FILTERING
56
+ ↓
57
+ AgentEvent(type="judging", message=f"Manager ({event.kind}): {text[:200]}...")
58
+ ↓
59
+ models.py:to_markdown() β†’ "🧠 **JUDGING**: Manager (task_ledger): ..."
60
+ ↓
61
+ app.py β†’ Displayed to user verbatim
62
+ ```
63
+
64
+ ## Impact
65
+
66
+ 1. **User Confusion**: Users see internal framework bookkeeping, not meaningful progress
67
+ 2. **Truncated Gibberish**: 200-char limit cuts prompts mid-sentence, making them uninterpretable
68
+ 3. **Misleading Labels**: "JUDGING" event type is wrong - these are task assignments
69
+ 4. **No Actionable Info**: Users can't understand what the system is actually doing
70
+
71
+ ## Proposed Solutions
72
+
73
+ ### Option A: Filter Internal Events (Minimal)
74
+
75
+ Skip internal manager events entirely - they're framework bookkeeping:
76
+
77
+ ```python
78
+ def _process_event(self, event: Any, iteration: int) -> AgentEvent | None:
79
+ if isinstance(event, MagenticOrchestratorMessageEvent):
80
+ # Skip internal framework bookkeeping events
81
+ if event.kind in ("user_task", "task_ledger", "instruction"):
82
+ return None # Don't expose to users
83
+ # ... rest of handling
84
+ ```
85
+
86
+ **Pros**: Simple, removes noise
87
+ **Cons**: Users lose visibility into manager activity
88
+
89
+ ### Option B: Transform to User-Friendly Messages (Better UX)
90
+
91
+ Map internal events to meaningful user messages:
92
+
93
+ ```python
94
+ MANAGER_EVENT_MESSAGES = {
95
+ "user_task": "Manager received research task",
96
+ "task_ledger": "Manager tracking task progress",
97
+ "instruction": "Manager assigning work to agent",
98
+ }
99
+
100
+ def _process_event(self, event: Any, iteration: int) -> AgentEvent | None:
101
+ if isinstance(event, MagenticOrchestratorMessageEvent):
102
+ if event.kind in MANAGER_EVENT_MESSAGES:
103
+ return AgentEvent(
104
+ type="progress", # Not "judging"!
105
+ message=MANAGER_EVENT_MESSAGES[event.kind],
106
+ iteration=iteration,
107
+ )
108
+ ```
109
+
110
+ **Pros**: Users see meaningful progress, correct event types
111
+ **Cons**: More code, loses raw detail for debugging
112
+
113
+ ### Option C: Smart Truncation + Verbose Mode
114
+
115
+ 1. Truncate at sentence boundaries, not hard character limit
116
+ 2. Add `verbose_mode` setting that shows full internal events for debugging
117
+ 3. Use appropriate event types based on `event.kind`
118
+
119
+ ```python
120
+ def _smart_truncate(self, text: str, max_len: int = 200) -> str:
121
+ """Truncate at sentence boundary."""
122
+ if len(text) <= max_len:
123
+ return text
124
+ # Find last sentence boundary before limit
125
+ truncated = text[:max_len]
126
+ last_period = truncated.rfind(". ")
127
+ if last_period > max_len // 2:
128
+ return truncated[:last_period + 1]
129
+ return truncated.rsplit(" ", 1)[0] + "..."
130
+ ```
131
+
132
+ ### Recommended Approach
133
+
134
+ **Combine Option A + B**:
135
+
136
+ 1. **Default**: Filter out `task_ledger` and `instruction` events (pure bookkeeping)
137
+ 2. **Transform**: `user_task` β†’ "Assigning research task to agents"
138
+ 3. **Proper Types**: Use `"progress"` not `"judging"` for manager events
139
+ 4. **Future**: Add verbose mode for debugging
140
+
141
+ ## Files to Modify
142
+
143
+ 1. `src/orchestrators/advanced.py:361-410` - `_process_event()` method
144
+ 2. `src/utils/models.py:107-123` - Add new event types if needed
145
+ 3. `tests/unit/orchestrators/test_advanced_timeout.py` - Update assertions
146
+
147
+ ## Related Issues
148
+
149
+ - P0: Advanced Mode Timeout No Synthesis (FIXED in PR #104)
150
+ - This P1 was discovered while testing the P0 fix
151
+
152
+ ## Testing the Bug
153
+
154
+ ```python
155
+ import asyncio
156
+ from src.orchestrators.advanced import AdvancedOrchestrator
157
+
158
+ async def test():
159
+ orch = AdvancedOrchestrator(max_rounds=3)
160
+ async for event in orch.run("sildenafil mechanism"):
161
+ if "Manager" in event.message:
162
+ print(f"[{event.type}] {event.message}")
163
+ # You'll see uninterpretable output
164
+
165
+ asyncio.run(test())
166
+ ```
167
+
168
+ ## References
169
+
170
+ - Microsoft Agent Framework: https://github.com/microsoft/agent-framework
171
+ - AgentEvent model: `src/utils/models.py:104`
172
+ - Advanced orchestrator: `src/orchestrators/advanced.py`