VibecoderMcSwaggins commited on
Commit
3f60cec
·
1 Parent(s): 4f910c4

fix(P1): Resolve HuggingFace 401 - invalid token was root cause

Browse files

Reverts unnecessary over-engineering from previous "fix(auth)" commit.

Root cause: HF_TOKEN in .env was invalid/expired, not infrastructure issues.

Resolution:
- Generated new valid HF_TOKEN
- Keep Qwen/Qwen2.5-72B model change (good for reliability)
- Updated bug doc with actual root cause and lessons learned

The Pydantic settings alias="HF_TOKEN" works correctly - no need
for os.environ.get() hacks or extra debug logging.

docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md CHANGED
@@ -1,8 +1,9 @@
1
- # P1 Bug: HuggingFace Router 401 Unauthorized (Hyperbolic Provider)
2
 
3
- **Severity**: P1 (High) - Free Tier completely broken
4
- **Status**: Open
5
  **Discovered**: 2025-12-01
 
6
  **Reporter**: Production user via HuggingFace Spaces
7
 
8
  ## Symptom
@@ -13,174 +14,49 @@ https://router.huggingface.co/hyperbolic/v1/chat/completions
13
  Invalid username or password.
14
  ```
15
 
16
- ## Root Cause Analysis
17
 
18
- ### What Changed (NOT our code)
19
 
20
- HuggingFace has migrated their Inference API infrastructure:
21
 
22
- 1. **Old endpoint** (deprecated): `https://api-inference.huggingface.co`
23
- 2. **New endpoint**: `https://router.huggingface.co/{provider}/v1/chat/completions`
24
 
25
- The new "router" system routes requests to **partner providers** based on the model:
26
- - `meta-llama/Llama-3.1-70B-Instruct` **Hyperbolic** (partner)
27
- - Other models various providers
 
28
 
29
- **Critical Issue**: Hyperbolic requires authentication even for models that were previously "free tier" on HuggingFace's native infrastructure.
30
-
31
- ### Call Stack Trace
32
-
33
- ```
34
- User Query (HuggingFace Spaces)
35
-
36
- src/app.py:research_agent()
37
-
38
- src/orchestrators/advanced.py:AdvancedOrchestrator.run()
39
-
40
- src/clients/factory.py:get_chat_client() [line 69-76]
41
- → No OpenAI key → Falls back to HuggingFace
42
-
43
- src/clients/huggingface.py:HuggingFaceChatClient.__init__() [line 52-56]
44
- → InferenceClient(model="meta-llama/Llama-3.1-70B-Instruct", token=None)
45
-
46
- huggingface_hub.InferenceClient.chat_completion()
47
- → Routes to: https://router.huggingface.co/hyperbolic/v1/chat/completions
48
- → 401 Unauthorized (Hyperbolic rejects unauthenticated requests)
49
- ```
50
-
51
- ### Evidence
52
-
53
- - **huggingface_hub version**: 0.36.0 (latest)
54
- - **pyproject.toml constraint**: `>=0.24.0`
55
- - **HuggingFace Forum Reference**: [API endpoint migration thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
56
-
57
- ## Impact
58
-
59
- | Component | Impact |
60
- |-----------|--------|
61
- | Free Tier (no API key) | **COMPLETELY BROKEN** |
62
- | HuggingFace Spaces demo | **BROKEN** |
63
- | Users without OpenAI key | **Cannot use app** |
64
- | Paid tier (OpenAI key) | Unaffected |
65
-
66
- ## Proposed Solutions
67
-
68
- ### Option 1: Switch to Smaller Free Model (Quick Fix)
69
-
70
- Change default model from `meta-llama/Llama-3.1-70B-Instruct` to a model that's still hosted on HuggingFace's native infrastructure:
71
-
72
- ```python
73
- # src/utils/config.py
74
- huggingface_model: str | None = Field(
75
- default="mistralai/Mistral-7B-Instruct-v0.3", # Still on HF native
76
- description="HuggingFace model name"
77
- )
78
- ```
79
-
80
- **Candidates** (need testing):
81
- - `mistralai/Mistral-7B-Instruct-v0.3`
82
- - `HuggingFaceH4/zephyr-7b-beta`
83
- - `microsoft/Phi-3-mini-4k-instruct`
84
- - `google/gemma-2-9b-it`
85
-
86
- **Pros**: Quick fix, no auth required
87
- **Cons**: Lower quality output than Llama 3.1 70B
88
-
89
- ### Option 2: Require HF_TOKEN for Free Tier
90
-
91
- Document that `HF_TOKEN` is now **required** (not optional) for Free Tier:
92
-
93
- ```python
94
- # src/clients/factory.py
95
- if not settings.hf_token:
96
- raise ConfigurationError(
97
- "HF_TOKEN is now required for HuggingFace free tier. "
98
- "Get yours at https://huggingface.co/settings/tokens"
99
- )
100
- ```
101
-
102
- **Pros**: Keeps Llama 3.1 70B quality
103
- **Cons**: Friction for users, not truly "free" anymore
104
-
105
- ### Option 3: Server-Side HF_TOKEN on Spaces
106
-
107
- Set `HF_TOKEN` as a secret in HuggingFace Spaces settings:
108
- 1. Go to Space Settings → Repository Secrets
109
- 2. Add `HF_TOKEN` with a valid token
110
- 3. Users get free tier without needing their own token
111
-
112
- **Pros**: Best UX, transparent to users
113
- **Cons**: Token usage counted against our account
114
-
115
- ### Option 4: Hybrid Fallback Chain
116
-
117
- Try multiple models in order until one works:
118
-
119
- ```python
120
- FALLBACK_MODELS = [
121
- "meta-llama/Llama-3.1-70B-Instruct", # Best quality (needs token)
122
- "mistralai/Mistral-7B-Instruct-v0.3", # Good quality (free)
123
- "microsoft/Phi-3-mini-4k-instruct", # Lightweight (free)
124
- ]
125
- ```
126
-
127
- **Pros**: Graceful degradation
128
- **Cons**: Complexity, inconsistent output quality
129
-
130
- ## Recommended Fix
131
-
132
- **Short-term (P1)**: Option 3 - Add `HF_TOKEN` to HuggingFace Spaces secrets
133
-
134
- **Long-term**: Option 4 - Implement fallback chain with clear user feedback about which model is active
135
-
136
- ## Testing
137
 
138
  ```bash
139
- # Test without token (should fail currently)
140
- unset HF_TOKEN
141
  uv run python -c "
142
- from huggingface_hub import InferenceClient
143
- client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct')
144
- response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
145
- print(response)
146
- "
147
 
148
- # Test with token (should work)
149
- export HF_TOKEN=hf_xxxxx
150
- uv run python -c "
151
- from huggingface_hub import InferenceClient
152
- client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct', token='$HF_TOKEN')
153
- response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
154
- print(response)
155
  "
 
 
 
156
  ```
157
 
158
- ## References
159
-
160
- - [HuggingFace API Migration Thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
161
- - [GitHub Issue: 401 Unauthorized](https://github.com/huggingface/transformers/issues/38289)
162
- - [HuggingFace Inference Endpoints Docs](https://huggingface.co/docs/huggingface_hub/guides/inference)
163
- ## Update 2025-12-01 21:45 PST
164
-
165
- **Attempted Fix 1**: Switched model from `meta-llama/Llama-3.1-70B-Instruct` (Hyperbolic) to `Qwen/Qwen2.5-72B-Instruct` (routed to **Novita**).
166
-
167
- **Result**: Failed with same 401 error on Novita.
168
- ```
169
- 401 Client Error: Unauthorized for url: https://router.huggingface.co/novita/v3/openai/chat/completions
170
- Invalid username or password.
171
- ```
172
 
173
- **New Findings**:
174
- 1. **All Large Models are Partners**: Both Llama-70B and Qwen-72B are routed to partner providers (Hyperbolic, Novita).
175
- 2. **Partners Require Auth**: Partner providers strictly require authentication. Anonymous access is blocked.
176
- 3. **Token Propagation Failure**: Even with `HF_TOKEN` set in Spaces secrets, the `huggingface_hub` library might not be picking it up via Pydantic settings if `alias` resolution is flaky in the environment.
177
- 4. **Possible Token Permission Issue**: The user's token might lack permissions for Partner Inference endpoints.
178
 
179
- **Corrective Actions**:
180
- 1. **Robust Config Loading**: Modified `src/utils/config.py` to use `default_factory=lambda: os.environ.get("HF_TOKEN")` to guarantee environment variable reading.
181
- 2. **Debug Logging**: Added explicit logging in `src/clients/huggingface.py` to confirming if a token is being used (masked).
182
- 3. **Retain Qwen**: Keeping `Qwen/Qwen2.5-72B-Instruct` as it's a capable model. If auth is fixed, it should work.
183
 
184
- **Next Steps**:
185
- - Deploy these changes to debug the token loading.
186
- - If token is loaded but still failing, the user must generate a new `HF_TOKEN` with **"Make calls to inference endpoints"** permissions.
 
 
 
1
+ # P1 Bug: HuggingFace Router 401 Unauthorized
2
 
3
+ **Severity**: P1 (High)
4
+ **Status**: RESOLVED
5
  **Discovered**: 2025-12-01
6
+ **Resolved**: 2025-12-01
7
  **Reporter**: Production user via HuggingFace Spaces
8
 
9
  ## Symptom
 
14
  Invalid username or password.
15
  ```
16
 
17
+ ## Root Cause
18
 
19
+ **The HF_TOKEN in `.env` and HuggingFace Spaces secrets was invalid/expired.**
20
 
21
+ Token `hf_ssayg...` failed `HfApi().whoami()` verification.
22
 
23
+ ## Resolution
 
24
 
25
+ 1. Generated new HF_TOKEN at https://huggingface.co/settings/tokens
26
+ 2. Updated `.env` with new token: `hf_gZVBI...`
27
+ 3. Updated HuggingFace Spaces secret with same token
28
+ 4. Switched default model from `meta-llama/Llama-3.1-70B-Instruct` to `Qwen/Qwen2.5-72B-Instruct` (better reliability via HF router)
29
 
30
+ ## Verification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ```bash
 
 
33
  uv run python -c "
34
+ import os
35
+ from huggingface_hub import InferenceClient, HfApi
 
 
 
36
 
37
+ token = os.environ['HF_TOKEN'] # Your valid token from .env
38
+ api = HfApi(token=token)
39
+ print(f'Token valid: {api.whoami()[\"name\"]}')
40
+
41
+ client = InferenceClient(model='Qwen/Qwen2.5-72B-Instruct', token=token)
42
+ response = client.chat_completion(messages=[{'role': 'user', 'content': '2+2=?'}], max_tokens=10)
43
+ print(f'Inference works: {response.choices[0].message.content}')
44
  "
45
+ # Output:
46
+ # Token valid: VibecoderMcSwaggins
47
+ # Inference works: 4
48
  ```
49
 
50
+ ## Lessons Learned
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ 1. **First-principles debugging**: Before adding complex "fixes", verify basic assumptions (is the token actually valid?)
53
+ 2. **Token expiration**: HuggingFace tokens can expire or become invalid. Always verify with `whoami()`.
54
+ 3. **Model routing**: HuggingFace routes large models to partner providers (Hyperbolic, Novita). All require valid auth.
 
 
55
 
56
+ ## Files Changed
 
 
 
57
 
58
+ - `src/utils/config.py`: Changed default model to `Qwen/Qwen2.5-72B-Instruct`
59
+ - `src/clients/huggingface.py`: Updated fallback model reference
60
+ - `src/agent_factory/judges.py`: Updated fallback model reference
61
+ - `src/orchestrators/langgraph_orchestrator.py`: Updated hardcoded model
62
+ - `CLAUDE.md`, `AGENTS.md`, `GEMINI.md`: Updated documentation
src/clients/huggingface.py CHANGED
@@ -45,27 +45,14 @@ class HuggingFaceChatClient(BaseChatClient): # type: ignore[misc]
45
  self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
46
  self.api_key = api_key or settings.hf_token
47
 
48
- # Debug logging for auth issues
49
- if self.api_key:
50
- masked_key = (
51
- f"{self.api_key[:4]}...{self.api_key[-4:]}" if len(self.api_key) > 8 else "***"
52
- )
53
- logger.info(f"HuggingFaceChatClient using explicit API token: {masked_key}")
54
- else:
55
- logger.warning(
56
- "HuggingFaceChatClient initialized WITHOUT explicit API token "
57
- "(relying on cached token or anonymous access)"
58
- )
59
-
60
- try:
61
- self._client = InferenceClient(
62
- model=self.model_id,
63
- token=self.api_key,
64
- timeout=kwargs.get("timeout", 120), # Default to 120s
65
- )
66
- except Exception as e:
67
- logger.error(f"Failed to initialize HuggingFace client: {e}")
68
- raise
69
 
70
  def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
71
  """Convert framework messages to HuggingFace format."""
 
45
  self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
46
  self.api_key = api_key or settings.hf_token
47
 
48
+ # Initialize the HF Inference Client
49
+ # timeout=60 to prevent premature timeouts on long reasonings
50
+ self._client = InferenceClient(
51
+ model=self.model_id,
52
+ token=self.api_key,
53
+ timeout=60,
54
+ )
55
+ logger.info("Initialized HuggingFaceChatClient", model=self.model_id)
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
58
  """Convert framework messages to HuggingFace format."""
src/utils/config.py CHANGED
@@ -1,7 +1,6 @@
1
  """Application configuration using Pydantic Settings."""
2
 
3
  import logging
4
- import os
5
  from typing import Literal
6
 
7
  import structlog
@@ -43,9 +42,7 @@ class Settings(BaseSettings):
43
  default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
44
  )
45
  hf_token: str | None = Field(
46
- default_factory=lambda: os.environ.get("HF_TOKEN"),
47
- alias="HF_TOKEN",
48
- description="HuggingFace API token",
49
  )
50
 
51
  # Embedding Configuration
 
1
  """Application configuration using Pydantic Settings."""
2
 
3
  import logging
 
4
  from typing import Literal
5
 
6
  import structlog
 
42
  default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
43
  )
44
  hf_token: str | None = Field(
45
+ default=None, alias="HF_TOKEN", description="HuggingFace API token"
 
 
46
  )
47
 
48
  # Embedding Configuration