VibecoderMcSwaggins commited on
Commit
bf5812d
Β·
1 Parent(s): 69291fe

fix(P1): Switch from Llama-3.1-70B to Qwen2.5-72B for HuggingFace free tier

Browse files

Root cause: HuggingFace now routes Llama-3.1-70B to Hyperbolic partner provider
which has unreliable "staging mode" authentication (401 errors even with valid tokens).

Fix: Switch to Qwen/Qwen2.5-72B-Instruct which:
- Works reliably via HuggingFace's native infrastructure
- Offers comparable 72B parameter quality
- Tested locally: SUCCESS

Files changed:
- src/utils/config.py: Default model changed
- src/clients/huggingface.py: Fallback default changed
- src/agent_factory/judges.py: Fallback default changed
- src/orchestrators/langgraph_orchestrator.py: Hardcoded model changed
- CLAUDE.md, AGENTS.md, GEMINI.md: Documentation updated
- docs/bugs/: Bug doc and ACTIVE_BUGS updated

AGENTS.md CHANGED
@@ -106,8 +106,9 @@ Given the rapid advancements, as of November 29, 2025, the DeepBoner project use
106
  - **Anthropic:** `claude-sonnet-4-5-20250929`
107
  - This is the mid-range Claude 4.5 model, released on September 29, 2025.
108
  - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
109
- - **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
110
- - This remains the default for the free tier, subject to quota limits.
 
111
 
112
  It is crucial to keep these defaults updated as the LLM landscape evolves.
113
 
 
106
  - **Anthropic:** `claude-sonnet-4-5-20250929`
107
  - This is the mid-range Claude 4.5 model, released on September 29, 2025.
108
  - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
109
+ - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
110
+ - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
111
+ - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
112
 
113
  It is crucial to keep these defaults updated as the LLM landscape evolves.
114
 
CLAUDE.md CHANGED
@@ -113,8 +113,9 @@ Given the rapid advancements, as of November 29, 2025, the DeepBoner project use
113
  - **Anthropic:** `claude-sonnet-4-5-20250929`
114
  - This is the mid-range Claude 4.5 model, released on September 29, 2025.
115
  - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
116
- - **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
117
- - This remains the default for the free tier, subject to quota limits.
 
118
 
119
  It is crucial to keep these defaults updated as the LLM landscape evolves.
120
 
 
113
  - **Anthropic:** `claude-sonnet-4-5-20250929`
114
  - This is the mid-range Claude 4.5 model, released on September 29, 2025.
115
  - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
116
+ - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
117
+ - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
118
+ - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
119
 
120
  It is crucial to keep these defaults updated as the LLM landscape evolves.
121
 
GEMINI.md CHANGED
@@ -88,8 +88,9 @@ Given the rapid advancements, as of November 29, 2025, the DeepBoner project use
88
  - **Anthropic:** `claude-sonnet-4-5-20250929`
89
  - This is the mid-range Claude 4.5 model, released on September 29, 2025.
90
  - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
91
- - **HuggingFace (Free Tier):** `meta-llama/Llama-3.1-70B-Instruct`
92
- - This remains the default for the free tier, subject to quota limits.
 
93
 
94
  It is crucial to keep these defaults updated as the LLM landscape evolves.
95
 
 
88
  - **Anthropic:** `claude-sonnet-4-5-20250929`
89
  - This is the mid-range Claude 4.5 model, released on September 29, 2025.
90
  - The flagship `Claude Opus 4.5` (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
91
+ - **HuggingFace (Free Tier):** `Qwen/Qwen2.5-72B-Instruct`
92
+ - Changed from Llama-3.1-70B (Dec 2025) due to HuggingFace routing Llama to Hyperbolic provider which has unreliable "staging mode" auth.
93
+ - Qwen 2.5 72B offers comparable quality and works reliably via HuggingFace's native infrastructure.
94
 
95
  It is crucial to keep these defaults updated as the LLM landscape evolves.
96
 
docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -1,32 +1,31 @@
1
  # Active Bugs
2
 
3
- > Last updated: 2025-12-01 (07:30 PST)
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
7
 
8
- ## P0 - Blocker
9
 
10
- ### P0 - Simple Mode Ignores Forced Synthesis (Issue #113)
11
- **File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
12
- **Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
13
- **Found:** 2025-12-01 (Free Tier Testing)
14
 
15
- **Problem:** When HuggingFace Inference fails 3 times, the Judge returns `recommendation="synthesize"` but Simple Mode's `_should_synthesize()` ignores it due to strict score thresholds (requires `combined_score >= 10` but forced synthesis has score 0).
16
 
17
- **Impact:** Free tier users see 10 iterations of "Gathering more evidence" despite Judge saying "synthesize".
 
 
 
 
18
 
19
- **Root Cause:** Coordination bug between two fixes:
20
- - **PR #71 (SPEC_06):** Added `_should_synthesize()` with strict thresholds
21
- - **Commit 5e761eb:** Added `_create_forced_synthesis_assessment()` with `score=0, confidence=0.1`
22
- - These don't work together - forced synthesis bypasses nothing.
23
 
24
- **Strategic Fix:** [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) - **INTEGRATION, NOT DELETION**
25
- - Create `HuggingFaceChatClient` adapter for Microsoft Agent Framework
26
- - **INTEGRATE** Simple Mode's free-tier capability into Advanced Mode
27
- - Users without API keys β†’ Advanced Mode with HuggingFace backend (capability PRESERVED)
28
- - Retire Simple Mode's redundant orchestration CODE (not the capability!)
29
- - Bug disappears because Advanced Mode handles termination correctly (Manager agent signals)
30
 
31
  ---
32
 
@@ -68,6 +67,17 @@
68
 
69
  ## Resolved Bugs
70
 
 
 
 
 
 
 
 
 
 
 
 
71
  ### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
72
  **File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
73
  **PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)
 
1
  # Active Bugs
2
 
3
+ > Last updated: 2025-12-01 (14:30 PST)
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
7
 
8
+ ## P1 - Important
9
 
10
+ ### P1 - HuggingFace Router 401 Unauthorized (Hyperbolic)
11
+ **File:** `docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md`
12
+ **Found:** 2025-12-01 (HuggingFace Spaces)
 
13
 
14
+ **Problem:** HuggingFace changed their Inference API infrastructure. Large models like `meta-llama/Llama-3.1-70B-Instruct` are now routed to partner provider "Hyperbolic" which requires authentication even for previously "free" models.
15
 
16
+ **Error:**
17
+ ```
18
+ 401 Client Error: Unauthorized for url:
19
+ https://router.huggingface.co/hyperbolic/v1/chat/completions
20
+ ```
21
 
22
+ **Impact:** Free Tier (no API key) is **COMPLETELY BROKEN**.
 
 
 
23
 
24
+ **Root Cause:** NOT our code. HuggingFace infrastructure change:
25
+ - Old: `api-inference.huggingface.co` (deprecated)
26
+ - New: `router.huggingface.co/{provider}/...` (routes to partners)
27
+
28
+ **Proposed Fix:** Add `HF_TOKEN` as a secret in HuggingFace Spaces settings, OR switch to a smaller model that's still on HF native infrastructure.
 
29
 
30
  ---
31
 
 
67
 
68
  ## Resolved Bugs
69
 
70
+ ### ~~P0 - Simple Mode Ignores Forced Synthesis~~ FIXED
71
+ **File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
72
+ **Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
73
+ **PR:** [#115](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/115) (SPEC-16)
74
+ **Found:** 2025-12-01
75
+ **Resolved:** 2025-12-01
76
+
77
+ - Problem: Simple Mode ignored forced synthesis signals from Judge.
78
+ - Fix: SPEC-16 unified architecture - removed Simple Mode entirely, integrated HuggingFace into Advanced Mode.
79
+ - Simple Mode code deleted, capability preserved via `HuggingFaceChatClient` adapter.
80
+
81
  ### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
82
  **File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
83
  **PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)
docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 Bug: HuggingFace Router 401 Unauthorized (Hyperbolic Provider)
2
+
3
+ **Severity**: P1 (High) - Free Tier completely broken
4
+ **Status**: Open
5
+ **Discovered**: 2025-12-01
6
+ **Reporter**: Production user via HuggingFace Spaces
7
+
8
+ ## Symptom
9
+
10
+ ```
11
+ 401 Client Error: Unauthorized for url:
12
+ https://router.huggingface.co/hyperbolic/v1/chat/completions
13
+ Invalid username or password.
14
+ ```
15
+
16
+ ## Root Cause Analysis
17
+
18
+ ### What Changed (NOT our code)
19
+
20
+ HuggingFace has migrated their Inference API infrastructure:
21
+
22
+ 1. **Old endpoint** (deprecated): `https://api-inference.huggingface.co`
23
+ 2. **New endpoint**: `https://router.huggingface.co/{provider}/v1/chat/completions`
24
+
25
+ The new "router" system routes requests to **partner providers** based on the model:
26
+ - `meta-llama/Llama-3.1-70B-Instruct` β†’ **Hyperbolic** (partner)
27
+ - Other models β†’ various providers
28
+
29
+ **Critical Issue**: Hyperbolic requires authentication even for models that were previously "free tier" on HuggingFace's native infrastructure.
30
+
31
+ ### Call Stack Trace
32
+
33
+ ```
34
+ User Query (HuggingFace Spaces)
35
+ ↓
36
+ src/app.py:research_agent()
37
+ ↓
38
+ src/orchestrators/advanced.py:AdvancedOrchestrator.run()
39
+ ↓
40
+ src/clients/factory.py:get_chat_client() [line 69-76]
41
+ β†’ No OpenAI key β†’ Falls back to HuggingFace
42
+ ↓
43
+ src/clients/huggingface.py:HuggingFaceChatClient.__init__() [line 52-56]
44
+ β†’ InferenceClient(model="meta-llama/Llama-3.1-70B-Instruct", token=None)
45
+ ↓
46
+ huggingface_hub.InferenceClient.chat_completion()
47
+ β†’ Routes to: https://router.huggingface.co/hyperbolic/v1/chat/completions
48
+ β†’ 401 Unauthorized (Hyperbolic rejects unauthenticated requests)
49
+ ```
50
+
51
+ ### Evidence
52
+
53
+ - **huggingface_hub version**: 0.36.0 (latest)
54
+ - **pyproject.toml constraint**: `>=0.24.0`
55
+ - **HuggingFace Forum Reference**: [API endpoint migration thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
56
+
57
+ ## Impact
58
+
59
+ | Component | Impact |
60
+ |-----------|--------|
61
+ | Free Tier (no API key) | **COMPLETELY BROKEN** |
62
+ | HuggingFace Spaces demo | **BROKEN** |
63
+ | Users without OpenAI key | **Cannot use app** |
64
+ | Paid tier (OpenAI key) | Unaffected |
65
+
66
+ ## Proposed Solutions
67
+
68
+ ### Option 1: Switch to Smaller Free Model (Quick Fix)
69
+
70
+ Change default model from `meta-llama/Llama-3.1-70B-Instruct` to a model that's still hosted on HuggingFace's native infrastructure:
71
+
72
+ ```python
73
+ # src/utils/config.py
74
+ huggingface_model: str | None = Field(
75
+ default="mistralai/Mistral-7B-Instruct-v0.3", # Still on HF native
76
+ description="HuggingFace model name"
77
+ )
78
+ ```
79
+
80
+ **Candidates** (need testing):
81
+ - `mistralai/Mistral-7B-Instruct-v0.3`
82
+ - `HuggingFaceH4/zephyr-7b-beta`
83
+ - `microsoft/Phi-3-mini-4k-instruct`
84
+ - `google/gemma-2-9b-it`
85
+
86
+ **Pros**: Quick fix, no auth required
87
+ **Cons**: Lower quality output than Llama 3.1 70B
88
+
89
+ ### Option 2: Require HF_TOKEN for Free Tier
90
+
91
+ Document that `HF_TOKEN` is now **required** (not optional) for Free Tier:
92
+
93
+ ```python
94
+ # src/clients/factory.py
95
+ if not settings.hf_token:
96
+ raise ConfigurationError(
97
+ "HF_TOKEN is now required for HuggingFace free tier. "
98
+ "Get yours at https://huggingface.co/settings/tokens"
99
+ )
100
+ ```
101
+
102
+ **Pros**: Keeps Llama 3.1 70B quality
103
+ **Cons**: Friction for users, not truly "free" anymore
104
+
105
+ ### Option 3: Server-Side HF_TOKEN on Spaces
106
+
107
+ Set `HF_TOKEN` as a secret in HuggingFace Spaces settings:
108
+ 1. Go to Space Settings β†’ Repository Secrets
109
+ 2. Add `HF_TOKEN` with a valid token
110
+ 3. Users get free tier without needing their own token
111
+
112
+ **Pros**: Best UX, transparent to users
113
+ **Cons**: Token usage counted against our account
114
+
115
+ ### Option 4: Hybrid Fallback Chain
116
+
117
+ Try multiple models in order until one works:
118
+
119
+ ```python
120
+ FALLBACK_MODELS = [
121
+ "meta-llama/Llama-3.1-70B-Instruct", # Best quality (needs token)
122
+ "mistralai/Mistral-7B-Instruct-v0.3", # Good quality (free)
123
+ "microsoft/Phi-3-mini-4k-instruct", # Lightweight (free)
124
+ ]
125
+ ```
126
+
127
+ **Pros**: Graceful degradation
128
+ **Cons**: Complexity, inconsistent output quality
129
+
130
+ ## Recommended Fix
131
+
132
+ **Short-term (P1)**: Option 3 - Add `HF_TOKEN` to HuggingFace Spaces secrets
133
+
134
+ **Long-term**: Option 4 - Implement fallback chain with clear user feedback about which model is active
135
+
136
+ ## Testing
137
+
138
+ ```bash
139
+ # Test without token (should fail currently)
140
+ unset HF_TOKEN
141
+ uv run python -c "
142
+ from huggingface_hub import InferenceClient
143
+ client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct')
144
+ response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
145
+ print(response)
146
+ "
147
+
148
+ # Test with token (should work)
149
+ export HF_TOKEN=hf_xxxxx
150
+ uv run python -c "
151
+ from huggingface_hub import InferenceClient
152
+ client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct', token='$HF_TOKEN')
153
+ response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
154
+ print(response)
155
+ "
156
+ ```
157
+
158
+ ## References
159
+
160
+ - [HuggingFace API Migration Thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
161
+ - [GitHub Issue: 401 Unauthorized](https://github.com/huggingface/transformers/issues/38289)
162
+ - [HuggingFace Inference Endpoints Docs](https://huggingface.co/docs/huggingface_hub/guides/inference)
src/agent_factory/judges.py CHANGED
@@ -82,7 +82,7 @@ def get_model() -> Any:
82
 
83
  # Priority 3: HuggingFace (requires HF_TOKEN)
84
  if settings.has_huggingface_key:
85
- model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
86
  hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
87
  return HuggingFaceModel(model_name, provider=hf_provider)
88
 
 
82
 
83
  # Priority 3: HuggingFace (requires HF_TOKEN)
84
  if settings.has_huggingface_key:
85
+ model_name = settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
86
  hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
87
  return HuggingFaceModel(model_name, provider=hf_provider)
88
 
src/clients/huggingface.py CHANGED
@@ -37,14 +37,12 @@ class HuggingFaceChatClient(BaseChatClient): # type: ignore[misc]
37
  """Initialize the HuggingFace chat client.
38
 
39
  Args:
40
- model_id: The HuggingFace model ID (default: configured value or Llama-3.1-70B).
41
  api_key: HF_TOKEN (optional, defaults to env var).
42
  **kwargs: Additional arguments passed to BaseChatClient.
43
  """
44
  super().__init__(**kwargs)
45
- self.model_id = (
46
- model_id or settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
47
- )
48
  self.api_key = api_key or settings.hf_token
49
 
50
  # Initialize the HF Inference Client
 
37
  """Initialize the HuggingFace chat client.
38
 
39
  Args:
40
+ model_id: The HuggingFace model ID (default: configured value or Qwen2.5-72B).
41
  api_key: HF_TOKEN (optional, defaults to env var).
42
  **kwargs: Additional arguments passed to BaseChatClient.
43
  """
44
  super().__init__(**kwargs)
45
+ self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
 
 
46
  self.api_key = api_key or settings.hf_token
47
 
48
  # Initialize the HF Inference Client
src/orchestrators/langgraph_orchestrator.py CHANGED
@@ -36,9 +36,10 @@ class LangGraphOrchestrator(OrchestratorProtocol):
36
  self._max_iterations = max_iterations
37
  self._checkpoint_path = checkpoint_path
38
 
39
- # Initialize the LLM (Llama 3.1 via HF Inference)
40
  # We use the serverless API by default
41
- repo_id = "meta-llama/Llama-3.1-70B-Instruct"
 
42
 
43
  # Ensure we have an API key
44
  api_key = settings.hf_token
 
36
  self._max_iterations = max_iterations
37
  self._checkpoint_path = checkpoint_path
38
 
39
+ # Initialize the LLM (Qwen 2.5 via HF Inference)
40
  # We use the serverless API by default
41
+ # NOTE: Llama-3.1-70B routes to Hyperbolic (unreliable staging mode)
42
+ repo_id = "Qwen/Qwen2.5-72B-Instruct"
43
 
44
  # Ensure we have an API key
45
  api_key = settings.hf_token
src/utils/config.py CHANGED
@@ -36,8 +36,10 @@ class Settings(BaseSettings):
36
  default="claude-sonnet-4-5-20250929", description="Anthropic model"
37
  )
38
  # HuggingFace (free tier)
 
 
39
  huggingface_model: str | None = Field(
40
- default="meta-llama/Llama-3.1-70B-Instruct", description="HuggingFace model name"
41
  )
42
  hf_token: str | None = Field(
43
  default=None, alias="HF_TOKEN", description="HuggingFace API token"
 
36
  default="claude-sonnet-4-5-20250929", description="Anthropic model"
37
  )
38
  # HuggingFace (free tier)
39
+ # NOTE: Llama-3.1-70B is routed to Hyperbolic (partner) which has unreliable "staging mode"
40
+ # Qwen2.5-72B works reliably via HuggingFace's native infrastructure
41
  huggingface_model: str | None = Field(
42
+ default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
43
  )
44
  hf_token: str | None = Field(
45
  default=None, alias="HF_TOKEN", description="HuggingFace API token"