Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /bugs /FIX_PLAN_MAGENTIC_MODE.md

VibecoderMcSwaggins

docs: add TDD fix plan for Magentic mode report generation

17d34a8 16 days ago

preview code

raw

history blame

5.96 kB

Fix Plan: Magentic Mode Report Generation

Related Bug: P0_MAGENTIC_MODE_BROKEN.md Approach: Test-Driven Development (TDD) Estimated Scope: 4 tasks, ~2-3 hours

Problem Summary

Magentic mode runs but fails to produce readable reports due to:

Primary Bug: MagenticFinalResultEvent.message returns ChatMessage object, not text
Secondary Bug: Max rounds (3) reached before ReportAgent completes
Tertiary Issues: Stale "bioRxiv" references in prompts

Fix Order (TDD)

Phase 1: Write Failing Tests

Task 1.1: Create test for ChatMessage text extraction

# tests/unit/test_orchestrator_magentic.py

def test_process_event_extracts_text_from_chat_message():
    """Final result event should extract text from ChatMessage object."""
    # Arrange: Mock ChatMessage with .content attribute
    # Act: Call _process_event with MagenticFinalResultEvent
    # Assert: Returned AgentEvent.message is a string, not object repr

Task 1.2: Create test for max rounds configuration

def test_orchestrator_uses_configured_max_rounds():
    """MagenticOrchestrator should use max_rounds from constructor."""
    # Arrange: Create orchestrator with max_rounds=10
    # Act: Build workflow
    # Assert: Workflow has max_round_count=10

Task 1.3: Create test for bioRxiv reference removal

def test_task_prompt_references_europe_pmc():
    """Task prompt should reference Europe PMC, not bioRxiv."""
    # Arrange: Create orchestrator
    # Act: Check task string in run()
    # Assert: Contains "Europe PMC", not "bioRxiv"

Phase 2: Fix ChatMessage Text Extraction

File: src/orchestrator_magentic.py Lines: 192-199

Current Code:

elif isinstance(event, MagenticFinalResultEvent):
    text = event.message.text if event.message else "No result"

Fixed Code:

elif isinstance(event, MagenticFinalResultEvent):
    if event.message:
        # ChatMessage may have .content or .text depending on version
        if hasattr(event.message, 'content') and event.message.content:
            text = str(event.message.content)
        elif hasattr(event.message, 'text') and event.message.text:
            text = str(event.message.text)
        else:
            # Fallback: convert entire message to string
            text = str(event.message)
    else:
        text = "No result generated"

Why: The agent_framework.ChatMessage object structure may vary. We need defensive extraction.

Phase 3: Fix Max Rounds Configuration

File: src/orchestrator_magentic.py Lines: 97-99

Current Code:

.with_standard_manager(
    chat_client=manager_client,
    max_round_count=self._max_rounds,  # Already uses config
    max_stall_count=3,
    max_reset_count=2,
)

Issue: Default max_rounds in __init__ is 10, but workflow may need more for complex queries.

Fix: Verify the value flows through correctly. Add logging.

logger.info(
    "Building Magentic workflow",
    max_rounds=self._max_rounds,
    max_stall=3,
    max_reset=2,
)

Also check: src/orchestrator_factory.py passes config correctly:

return MagenticOrchestrator(
    max_rounds=config.max_iterations if config else 10,
)

Phase 4: Fix Stale bioRxiv References

Files to update:

File	Line	Change
`src/orchestrator_magentic.py`	131	"bioRxiv" → "Europe PMC"
`src/agents/magentic_agents.py`	32-33	"bioRxiv" → "Europe PMC"
`src/app.py`	202-203	"bioRxiv" → "Europe PMC"

Search command to verify:

grep -rn "bioRxiv\|biorxiv" src/

Implementation Checklist

[ ] Phase 1: Write failing tests
    [ ] 1.1 Test ChatMessage text extraction
    [ ] 1.2 Test max rounds configuration
    [ ] 1.3 Test Europe PMC references

[ ] Phase 2: Fix ChatMessage extraction
    [ ] Update _process_event() in orchestrator_magentic.py
    [ ] Run test 1.1 - should pass

[ ] Phase 3: Fix max rounds
    [ ] Add logging to _build_workflow()
    [ ] Verify factory passes config correctly
    [ ] Run test 1.2 - should pass

[ ] Phase 4: Fix bioRxiv references
    [ ] Update orchestrator_magentic.py task prompt
    [ ] Update magentic_agents.py descriptions
    [ ] Update app.py UI text
    [ ] Run test 1.3 - should pass
    [ ] Run grep to verify no remaining refs

[ ] Final Verification
    [ ] make check passes
    [ ] All tests pass (108+)
    [ ] Manual test: run_magentic.py produces readable report

Test Commands

# Run specific test file
uv run pytest tests/unit/test_orchestrator_magentic.py -v

# Run all tests
uv run pytest tests/unit/ -v

# Full check
make check

# Manual integration test
set -a && source .env && set +a
uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"

Success Criteria

run_magentic.py outputs a readable research report (not <ChatMessage object>)
Report includes: Executive Summary, Key Findings, Drug Candidates, References
No "Max round count reached" error with default settings
No "bioRxiv" references anywhere in codebase
All 108+ tests pass
make check passes

Files Modified

src/
├── orchestrator_magentic.py   # ChatMessage fix, logging
├── agents/magentic_agents.py  # bioRxiv → Europe PMC
└── app.py                     # bioRxiv → Europe PMC

tests/unit/
└── test_orchestrator_magentic.py  # NEW: 3 tests

Notes for AI Agent

When implementing this fix plan:

DO NOT create mock data or fake responses
DO write real tests that verify actual behavior
DO run make check after each phase
DO test with real OpenAI API key via .env
DO preserve existing functionality - simple mode must still work
DO NOT over-engineer - minimal changes to fix the specific bugs