VibecoderMcSwaggins commited on
Commit
4f910c4
Β·
1 Parent(s): 6dcd3d9

docs: Restore full history to bug doc

Browse files
docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md CHANGED
@@ -1,3 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ## Update 2025-12-01 21:45 PST
2
 
3
  **Attempted Fix 1**: Switched model from `meta-llama/Llama-3.1-70B-Instruct` (Hyperbolic) to `Qwen/Qwen2.5-72B-Instruct` (routed to **Novita**).
 
1
+ # P1 Bug: HuggingFace Router 401 Unauthorized (Hyperbolic Provider)
2
+
3
+ **Severity**: P1 (High) - Free Tier completely broken
4
+ **Status**: Open
5
+ **Discovered**: 2025-12-01
6
+ **Reporter**: Production user via HuggingFace Spaces
7
+
8
+ ## Symptom
9
+
10
+ ```
11
+ 401 Client Error: Unauthorized for url:
12
+ https://router.huggingface.co/hyperbolic/v1/chat/completions
13
+ Invalid username or password.
14
+ ```
15
+
16
+ ## Root Cause Analysis
17
+
18
+ ### What Changed (NOT our code)
19
+
20
+ HuggingFace has migrated their Inference API infrastructure:
21
+
22
+ 1. **Old endpoint** (deprecated): `https://api-inference.huggingface.co`
23
+ 2. **New endpoint**: `https://router.huggingface.co/{provider}/v1/chat/completions`
24
+
25
+ The new "router" system routes requests to **partner providers** based on the model:
26
+ - `meta-llama/Llama-3.1-70B-Instruct` β†’ **Hyperbolic** (partner)
27
+ - Other models β†’ various providers
28
+
29
+ **Critical Issue**: Hyperbolic requires authentication even for models that were previously "free tier" on HuggingFace's native infrastructure.
30
+
31
+ ### Call Stack Trace
32
+
33
+ ```
34
+ User Query (HuggingFace Spaces)
35
+ ↓
36
+ src/app.py:research_agent()
37
+ ↓
38
+ src/orchestrators/advanced.py:AdvancedOrchestrator.run()
39
+ ↓
40
+ src/clients/factory.py:get_chat_client() [line 69-76]
41
+ β†’ No OpenAI key β†’ Falls back to HuggingFace
42
+ ↓
43
+ src/clients/huggingface.py:HuggingFaceChatClient.__init__() [line 52-56]
44
+ β†’ InferenceClient(model="meta-llama/Llama-3.1-70B-Instruct", token=None)
45
+ ↓
46
+ huggingface_hub.InferenceClient.chat_completion()
47
+ β†’ Routes to: https://router.huggingface.co/hyperbolic/v1/chat/completions
48
+ β†’ 401 Unauthorized (Hyperbolic rejects unauthenticated requests)
49
+ ```
50
+
51
+ ### Evidence
52
+
53
+ - **huggingface_hub version**: 0.36.0 (latest)
54
+ - **pyproject.toml constraint**: `>=0.24.0`
55
+ - **HuggingFace Forum Reference**: [API endpoint migration thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
56
+
57
+ ## Impact
58
+
59
+ | Component | Impact |
60
+ |-----------|--------|
61
+ | Free Tier (no API key) | **COMPLETELY BROKEN** |
62
+ | HuggingFace Spaces demo | **BROKEN** |
63
+ | Users without OpenAI key | **Cannot use app** |
64
+ | Paid tier (OpenAI key) | Unaffected |
65
+
66
+ ## Proposed Solutions
67
+
68
+ ### Option 1: Switch to Smaller Free Model (Quick Fix)
69
+
70
+ Change default model from `meta-llama/Llama-3.1-70B-Instruct` to a model that's still hosted on HuggingFace's native infrastructure:
71
+
72
+ ```python
73
+ # src/utils/config.py
74
+ huggingface_model: str | None = Field(
75
+ default="mistralai/Mistral-7B-Instruct-v0.3", # Still on HF native
76
+ description="HuggingFace model name"
77
+ )
78
+ ```
79
+
80
+ **Candidates** (need testing):
81
+ - `mistralai/Mistral-7B-Instruct-v0.3`
82
+ - `HuggingFaceH4/zephyr-7b-beta`
83
+ - `microsoft/Phi-3-mini-4k-instruct`
84
+ - `google/gemma-2-9b-it`
85
+
86
+ **Pros**: Quick fix, no auth required
87
+ **Cons**: Lower quality output than Llama 3.1 70B
88
+
89
+ ### Option 2: Require HF_TOKEN for Free Tier
90
+
91
+ Document that `HF_TOKEN` is now **required** (not optional) for Free Tier:
92
+
93
+ ```python
94
+ # src/clients/factory.py
95
+ if not settings.hf_token:
96
+ raise ConfigurationError(
97
+ "HF_TOKEN is now required for HuggingFace free tier. "
98
+ "Get yours at https://huggingface.co/settings/tokens"
99
+ )
100
+ ```
101
+
102
+ **Pros**: Keeps Llama 3.1 70B quality
103
+ **Cons**: Friction for users, not truly "free" anymore
104
+
105
+ ### Option 3: Server-Side HF_TOKEN on Spaces
106
+
107
+ Set `HF_TOKEN` as a secret in HuggingFace Spaces settings:
108
+ 1. Go to Space Settings β†’ Repository Secrets
109
+ 2. Add `HF_TOKEN` with a valid token
110
+ 3. Users get free tier without needing their own token
111
+
112
+ **Pros**: Best UX, transparent to users
113
+ **Cons**: Token usage counted against our account
114
+
115
+ ### Option 4: Hybrid Fallback Chain
116
+
117
+ Try multiple models in order until one works:
118
+
119
+ ```python
120
+ FALLBACK_MODELS = [
121
+ "meta-llama/Llama-3.1-70B-Instruct", # Best quality (needs token)
122
+ "mistralai/Mistral-7B-Instruct-v0.3", # Good quality (free)
123
+ "microsoft/Phi-3-mini-4k-instruct", # Lightweight (free)
124
+ ]
125
+ ```
126
+
127
+ **Pros**: Graceful degradation
128
+ **Cons**: Complexity, inconsistent output quality
129
+
130
+ ## Recommended Fix
131
+
132
+ **Short-term (P1)**: Option 3 - Add `HF_TOKEN` to HuggingFace Spaces secrets
133
+
134
+ **Long-term**: Option 4 - Implement fallback chain with clear user feedback about which model is active
135
+
136
+ ## Testing
137
+
138
+ ```bash
139
+ # Test without token (should fail currently)
140
+ unset HF_TOKEN
141
+ uv run python -c "
142
+ from huggingface_hub import InferenceClient
143
+ client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct')
144
+ response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
145
+ print(response)
146
+ "
147
+
148
+ # Test with token (should work)
149
+ export HF_TOKEN=hf_xxxxx
150
+ uv run python -c "
151
+ from huggingface_hub import InferenceClient
152
+ client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct', token='$HF_TOKEN')
153
+ response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
154
+ print(response)
155
+ "
156
+ ```
157
+
158
+ ## References
159
+
160
+ - [HuggingFace API Migration Thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
161
+ - [GitHub Issue: 401 Unauthorized](https://github.com/huggingface/transformers/issues/38289)
162
+ - [HuggingFace Inference Endpoints Docs](https://huggingface.co/docs/huggingface_hub/guides/inference)
163
  ## Update 2025-12-01 21:45 PST
164
 
165
  **Attempted Fix 1**: Switched model from `meta-llama/Llama-3.1-70B-Instruct` (Hyperbolic) to `Qwen/Qwen2.5-72B-Instruct` (routed to **Novita**).