Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on 20 days ago

Commit

bf5812d

1 Parent(s): 69291fe

fix(P1): Switch from Llama-3.1-70B to Qwen2.5-72B for HuggingFace free tier

Root cause: HuggingFace now routes Llama-3.1-70B to Hyperbolic partner provider
which has unreliable "staging mode" authentication (401 errors even with valid tokens).

Fix: Switch to Qwen/Qwen2.5-72B-Instruct which:
- Works reliably via HuggingFace's native infrastructure
- Offers comparable 72B parameter quality
- Tested locally: SUCCESS

Files changed:
- src/utils/config.py: Default model changed
- src/clients/huggingface.py: Fallback default changed
- src/agent_factory/judges.py: Fallback default changed
- src/orchestrators/langgraph_orchestrator.py: Hardcoded model changed
- CLAUDE.md, AGENTS.md, GEMINI.md: Documentation updated
- docs/bugs/: Bug doc and ACTIVE_BUGS updated

Files changed (9) hide show

AGENTS.md +3 -2
CLAUDE.md +3 -2
GEMINI.md +3 -2
docs/bugs/ACTIVE_BUGS.md +28 -18
docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md +162 -0
src/agent_factory/judges.py +1 -1
src/clients/huggingface.py +2 -4
src/orchestrators/langgraph_orchestrator.py +3 -2
src/utils/config.py +3 -1

AGENTS.md CHANGED Viewed

@@ -106,8 +106,9 @@ Given the rapid advancements, as of November 29, 2025, the DeepBoner project use
 - **Anthropic:** `claude-sonnet-4-5-20250929`
   - This is the mid-range Claude 4.5 model, released on September 29, 2025.
   - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
-- **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
-  - This remains the default for the free tier, subject to quota limits.
 It is crucial to keep these defaults updated as the LLM landscape evolves.

 - **Anthropic:** `claude-sonnet-4-5-20250929`
   - This is the mid-range Claude 4.5 model, released on September 29, 2025.
   - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
+- **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
+  - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
+  - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
 It is crucial to keep these defaults updated as the LLM landscape evolves.

CLAUDE.md CHANGED Viewed

@@ -113,8 +113,9 @@ Given the rapid advancements, as of November 29, 2025, the DeepBoner project use
 - **Anthropic:** `claude-sonnet-4-5-20250929`
   - This is the mid-range Claude 4.5 model, released on September 29, 2025.
   - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
-- **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
-  - This remains the default for the free tier, subject to quota limits.
 It is crucial to keep these defaults updated as the LLM landscape evolves.

 - **Anthropic:** `claude-sonnet-4-5-20250929`
   - This is the mid-range Claude 4.5 model, released on September 29, 2025.
   - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
+- **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
+  - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
+  - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
 It is crucial to keep these defaults updated as the LLM landscape evolves.

GEMINI.md CHANGED Viewed

@@ -88,8 +88,9 @@ Given the rapid advancements, as of November 29, 2025, the DeepBoner project use
 - **Anthropic:** `claude-sonnet-4-5-20250929`
   - This is the mid-range Claude 4.5 model, released on September 29, 2025.
   - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
-- **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
-  - This remains the default for the free tier, subject to quota limits.
 It is crucial to keep these defaults updated as the LLM landscape evolves.

 - **Anthropic:** `claude-sonnet-4-5-20250929`
   - This is the mid-range Claude 4.5 model, released on September 29, 2025.
   - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
+- **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
+  - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
+  - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
 It is crucial to keep these defaults updated as the LLM landscape evolves.

docs/bugs/ACTIVE_BUGS.md CHANGED Viewed

@@ -1,32 +1,31 @@
 # Active Bugs
-> Last updated: 2025-12-01 (07:30 PST)
 >
 > **Note:** Completed bug docs archived to `docs/bugs/archive/`
 > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
-## P0 - Blocker
-### P0 - Simple Mode Ignores Forced Synthesis (Issue #113)
-**File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
-**Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
-**Found:** 2025-12-01 (Free Tier Testing)
-**Problem:** When HuggingFace Inference fails 3 times, the Judge returns `recommendation="synthesize"` but Simple Mode's `_should_synthesize()` ignores it due to strict score thresholds (requires `combined_score >= 10` but forced synthesis has score 0).
-**Impact:** Free tier users see 10 iterations of "Gathering more evidence" despite Judge saying "synthesize".
-**Root Cause:** Coordination bug between two fixes:
-- **PR #71 (SPEC_06):** Added `_should_synthesize()` with strict thresholds
-- **Commit 5e761eb:** Added `_create_forced_synthesis_assessment()` with `score=0, confidence=0.1`
-- These don't work together - forced synthesis bypasses nothing.
-**Strategic Fix:** [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) - **INTEGRATION, NOT DELETION**
-- Create `HuggingFaceChatClient` adapter for Microsoft Agent Framework
-- **INTEGRATE** Simple Mode's free-tier capability into Advanced Mode
-- Users without API keys → Advanced Mode with HuggingFace backend (capability PRESERVED)
-- Retire Simple Mode's redundant orchestration CODE (not the capability!)
-- Bug disappears because Advanced Mode handles termination correctly (Manager agent signals)
 ---
@@ -68,6 +67,17 @@
 ## Resolved Bugs
 ### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
 **File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
 **PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)

 # Active Bugs
+> Last updated: 2025-12-01 (14:30 PST)
 >
 > **Note:** Completed bug docs archived to `docs/bugs/archive/`
 > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
+## P1 - Important
+### P1 - HuggingFace Router 401 Unauthorized (Hyperbolic)
+**File:** `docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md`
+**Found:** 2025-12-01 (HuggingFace Spaces)
+**Problem:** HuggingFace changed their Inference API infrastructure. Large models like `meta-llama/Llama-3.1-70B-Instruct` are now routed to partner provider "Hyperbolic" which requires authentication even for previously "free" models.
+**Error:**
+```
+401 Client Error: Unauthorized for url:
+https://router.huggingface.co/hyperbolic/v1/chat/completions
+```
+**Impact:** Free Tier (no API key) is **COMPLETELY BROKEN**.
+**Root Cause:** NOT our code. HuggingFace infrastructure change:
+- Old: `api-inference.huggingface.co` (deprecated)
+- New: `router.huggingface.co/{provider}/...` (routes to partners)
+**Proposed Fix:** Add `HF_TOKEN` as a secret in HuggingFace Spaces settings, OR switch to a smaller model that's still on HF native infrastructure.
 ---
 ## Resolved Bugs
+### ~~P0 - Simple Mode Ignores Forced Synthesis~~ FIXED
+**File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
+**Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
+**PR:** [#115](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/115) (SPEC-16)
+**Found:** 2025-12-01
+**Resolved:** 2025-12-01
+- Problem: Simple Mode ignored forced synthesis signals from Judge.
+- Fix: SPEC-16 unified architecture - removed Simple Mode entirely, integrated HuggingFace into Advanced Mode.
+- Simple Mode code deleted, capability preserved via `HuggingFaceChatClient` adapter.
 ### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
 **File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
 **PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)

docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md ADDED Viewed

	@@ -0,0 +1,162 @@

+# P1 Bug: HuggingFace Router 401 Unauthorized (Hyperbolic Provider)
+**Severity**: P1 (High) - Free Tier completely broken
+**Status**: Open
+**Discovered**: 2025-12-01
+**Reporter**: Production user via HuggingFace Spaces
+## Symptom
+```
+401 Client Error: Unauthorized for url:
+https://router.huggingface.co/hyperbolic/v1/chat/completions
+Invalid username or password.
+```
+## Root Cause Analysis
+### What Changed (NOT our code)
+HuggingFace has migrated their Inference API infrastructure:
+1. **Old endpoint** (deprecated): `https://api-inference.huggingface.co`
+2. **New endpoint**: `https://router.huggingface.co/{provider}/v1/chat/completions`
+The new "router" system routes requests to **partner providers** based on the model:
+- `meta-llama/Llama-3.1-70B-Instruct` → **Hyperbolic** (partner)
+- Other models → various providers
+**Critical Issue**: Hyperbolic requires authentication even for models that were previously "free tier" on HuggingFace's native infrastructure.
+### Call Stack Trace
+```
+User Query (HuggingFace Spaces)
+    ↓
+src/app.py:research_agent()
+    ↓
+src/orchestrators/advanced.py:AdvancedOrchestrator.run()
+    ↓
+src/clients/factory.py:get_chat_client()  [line 69-76]
+    → No OpenAI key → Falls back to HuggingFace
+    ↓
+src/clients/huggingface.py:HuggingFaceChatClient.__init__()  [line 52-56]
+    → InferenceClient(model="meta-llama/Llama-3.1-70B-Instruct", token=None)
+    ↓
+huggingface_hub.InferenceClient.chat_completion()
+    → Routes to: https://router.huggingface.co/hyperbolic/v1/chat/completions
+    → 401 Unauthorized (Hyperbolic rejects unauthenticated requests)
+```
+### Evidence
+- **huggingface_hub version**: 0.36.0 (latest)
+- **pyproject.toml constraint**: `>=0.24.0`
+- **HuggingFace Forum Reference**: [API endpoint migration thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
+## Impact
+| Component | Impact |
+|-----------|--------|
+| Free Tier (no API key) | **COMPLETELY BROKEN** |
+| HuggingFace Spaces demo | **BROKEN** |
+| Users without OpenAI key | **Cannot use app** |
+| Paid tier (OpenAI key) | Unaffected |
+## Proposed Solutions
+### Option 1: Switch to Smaller Free Model (Quick Fix)
+Change default model from `meta-llama/Llama-3.1-70B-Instruct` to a model that's still hosted on HuggingFace's native infrastructure:
+```python
+# src/utils/config.py
+huggingface_model: str | None = Field(
+    default="mistralai/Mistral-7B-Instruct-v0.3",  # Still on HF native
+    description="HuggingFace model name"
+)
+```
+**Candidates** (need testing):
+- `mistralai/Mistral-7B-Instruct-v0.3`
+- `HuggingFaceH4/zephyr-7b-beta`
+- `microsoft/Phi-3-mini-4k-instruct`
+- `google/gemma-2-9b-it`
+**Pros**: Quick fix, no auth required
+**Cons**: Lower quality output than Llama 3.1 70B
+### Option 2: Require HF_TOKEN for Free Tier
+Document that `HF_TOKEN` is now **required** (not optional) for Free Tier:
+```python
+# src/clients/factory.py
+if not settings.hf_token:
+    raise ConfigurationError(
+        "HF_TOKEN is now required for HuggingFace free tier. "
+        "Get yours at https://huggingface.co/settings/tokens"
+    )
+```
+**Pros**: Keeps Llama 3.1 70B quality
+**Cons**: Friction for users, not truly "free" anymore
+### Option 3: Server-Side HF_TOKEN on Spaces
+Set `HF_TOKEN` as a secret in HuggingFace Spaces settings:
+1. Go to Space Settings → Repository Secrets
+2. Add `HF_TOKEN` with a valid token
+3. Users get free tier without needing their own token
+**Pros**: Best UX, transparent to users
+**Cons**: Token usage counted against our account
+### Option 4: Hybrid Fallback Chain
+Try multiple models in order until one works:
+```python
+FALLBACK_MODELS = [
+    "meta-llama/Llama-3.1-70B-Instruct",  # Best quality (needs token)
+    "mistralai/Mistral-7B-Instruct-v0.3",  # Good quality (free)
+    "microsoft/Phi-3-mini-4k-instruct",    # Lightweight (free)
+]
+```
+**Pros**: Graceful degradation
+**Cons**: Complexity, inconsistent output quality
+## Recommended Fix
+**Short-term (P1)**: Option 3 - Add `HF_TOKEN` to HuggingFace Spaces secrets
+**Long-term**: Option 4 - Implement fallback chain with clear user feedback about which model is active
+## Testing
+```bash
+# Test without token (should fail currently)
+unset HF_TOKEN
+uv run python -c "
+from huggingface_hub import InferenceClient
+client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct')
+response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
+print(response)
+"
+# Test with token (should work)
+export HF_TOKEN=hf_xxxxx
+uv run python -c "
+from huggingface_hub import InferenceClient
+client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct', token='$HF_TOKEN')
+response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
+print(response)
+"
+```
+## References
+- [HuggingFace API Migration Thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
+- [GitHub Issue: 401 Unauthorized](https://github.com/huggingface/transformers/issues/38289)
+- [HuggingFace Inference Endpoints Docs](https://huggingface.co/docs/huggingface_hub/guides/inference)

src/agent_factory/judges.py CHANGED Viewed

@@ -82,7 +82,7 @@ def get_model() -> Any:
     # Priority 3: HuggingFace (requires HF_TOKEN)
     if settings.has_huggingface_key:
-        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)

     # Priority 3: HuggingFace (requires HF_TOKEN)
     if settings.has_huggingface_key:
+        model_name = settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)

src/clients/huggingface.py CHANGED Viewed

@@ -37,14 +37,12 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
         """Initialize the HuggingFace chat client.
         Args:
-            model_id: The HuggingFace model ID (default: configured value or Llama-3.1-70B).
             api_key: HF_TOKEN (optional, defaults to env var).
             **kwargs: Additional arguments passed to BaseChatClient.
         """
         super().__init__(**kwargs)
-        self.model_id = (
-            model_id or settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
-        )
         self.api_key = api_key or settings.hf_token
         # Initialize the HF Inference Client

         """Initialize the HuggingFace chat client.
         Args:
+            model_id: The HuggingFace model ID (default: configured value or Qwen2.5-72B).
             api_key: HF_TOKEN (optional, defaults to env var).
             **kwargs: Additional arguments passed to BaseChatClient.
         """
         super().__init__(**kwargs)
+        self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         self.api_key = api_key or settings.hf_token
         # Initialize the HF Inference Client

src/orchestrators/langgraph_orchestrator.py CHANGED Viewed

@@ -36,9 +36,10 @@ class LangGraphOrchestrator(OrchestratorProtocol):
         self._max_iterations = max_iterations
         self._checkpoint_path = checkpoint_path
-        # Initialize the LLM (Llama 3.1 via HF Inference)
         # We use the serverless API by default
-        repo_id = "meta-llama/Llama-3.1-70B-Instruct"
         # Ensure we have an API key
         api_key = settings.hf_token

         self._max_iterations = max_iterations
         self._checkpoint_path = checkpoint_path
+        # Initialize the LLM (Qwen 2.5 via HF Inference)
         # We use the serverless API by default
+        # NOTE: Llama-3.1-70B routes to Hyperbolic (unreliable staging mode)
+        repo_id = "Qwen/Qwen2.5-72B-Instruct"
         # Ensure we have an API key
         api_key = settings.hf_token

src/utils/config.py CHANGED Viewed

@@ -36,8 +36,10 @@ class Settings(BaseSettings):
         default="claude-sonnet-4-5-20250929", description="Anthropic model"
     )
     # HuggingFace (free tier)
     huggingface_model: str | None = Field(
-        default="meta-llama/Llama-3.1-70B-Instruct", description="HuggingFace model name"
     )
     hf_token: str | None = Field(
         default=None, alias="HF_TOKEN", description="HuggingFace API token"

         default="claude-sonnet-4-5-20250929", description="Anthropic model"
     )
     # HuggingFace (free tier)
+    # NOTE: Llama-3.1-70B is routed to Hyperbolic (partner) which has unreliable "staging mode"
+    # Qwen2.5-72B works reliably via HuggingFace's native infrastructure
     huggingface_model: str | None = Field(
+        default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
     )
     hf_token: str | None = Field(
         default=None, alias="HF_TOKEN", description="HuggingFace API token"