Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on 19 days ago

Commit

3f60cec

1 Parent(s): 4f910c4

fix(P1): Resolve HuggingFace 401 - invalid token was root cause

Reverts unnecessary over-engineering from previous "fix(auth)" commit.

Root cause: HF_TOKEN in .env was invalid/expired, not infrastructure issues.

Resolution:
- Generated new valid HF_TOKEN
- Keep Qwen/Qwen2.5-72B model change (good for reliability)
- Updated bug doc with actual root cause and lessons learned

The Pydantic settings alias="HF_TOKEN" works correctly - no need
for os.environ.get() hacks or extra debug logging.

Files changed (3) hide show

docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md +35 -159
src/clients/huggingface.py +8 -21
src/utils/config.py +1 -4

docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md CHANGED Viewed

@@ -1,8 +1,9 @@
-# P1 Bug: HuggingFace Router 401 Unauthorized (Hyperbolic Provider)
-**Severity**: P1 (High) - Free Tier completely broken
-**Status**: Open
 **Discovered**: 2025-12-01
 **Reporter**: Production user via HuggingFace Spaces
 ## Symptom
@@ -13,174 +14,49 @@ https://router.huggingface.co/hyperbolic/v1/chat/completions
 Invalid username or password.
 ```
-## Root Cause Analysis
-### What Changed (NOT our code)
-HuggingFace has migrated their Inference API infrastructure:
-1. **Old endpoint** (deprecated): `https://api-inference.huggingface.co`
-2. **New endpoint**: `https://router.huggingface.co/{provider}/v1/chat/completions`
-The new "router" system routes requests to **partner providers** based on the model:
-- `meta-llama/Llama-3.1-70B-Instruct` → **Hyperbolic** (partner)
-- Other models → various providers
-**Critical Issue**: Hyperbolic requires authentication even for models that were previously "free tier" on HuggingFace's native infrastructure.
-### Call Stack Trace
-```
-User Query (HuggingFace Spaces)
-    ↓
-src/app.py:research_agent()
-    ↓
-src/orchestrators/advanced.py:AdvancedOrchestrator.run()
-    ↓
-src/clients/factory.py:get_chat_client()  [line 69-76]
-    → No OpenAI key → Falls back to HuggingFace
-    ↓
-src/clients/huggingface.py:HuggingFaceChatClient.__init__()  [line 52-56]
-    → InferenceClient(model="meta-llama/Llama-3.1-70B-Instruct", token=None)
-    ↓
-huggingface_hub.InferenceClient.chat_completion()
-    → Routes to: https://router.huggingface.co/hyperbolic/v1/chat/completions
-    → 401 Unauthorized (Hyperbolic rejects unauthenticated requests)
-```
-### Evidence
-- **huggingface_hub version**: 0.36.0 (latest)
-- **pyproject.toml constraint**: `>=0.24.0`
-- **HuggingFace Forum Reference**: [API endpoint migration thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
-## Impact
-| Component | Impact |
-|-----------|--------|
-| Free Tier (no API key) | **COMPLETELY BROKEN** |
-| HuggingFace Spaces demo | **BROKEN** |
-| Users without OpenAI key | **Cannot use app** |
-| Paid tier (OpenAI key) | Unaffected |
-## Proposed Solutions
-### Option 1: Switch to Smaller Free Model (Quick Fix)
-Change default model from `meta-llama/Llama-3.1-70B-Instruct` to a model that's still hosted on HuggingFace's native infrastructure:
-```python
-# src/utils/config.py
-huggingface_model: str | None = Field(
-    default="mistralai/Mistral-7B-Instruct-v0.3",  # Still on HF native
-    description="HuggingFace model name"
-)
-```
-**Candidates** (need testing):
-- `mistralai/Mistral-7B-Instruct-v0.3`
-- `HuggingFaceH4/zephyr-7b-beta`
-- `microsoft/Phi-3-mini-4k-instruct`
-- `google/gemma-2-9b-it`
-**Pros**: Quick fix, no auth required
-**Cons**: Lower quality output than Llama 3.1 70B
-### Option 2: Require HF_TOKEN for Free Tier
-Document that `HF_TOKEN` is now **required** (not optional) for Free Tier:
-```python
-# src/clients/factory.py
-if not settings.hf_token:
-    raise ConfigurationError(
-        "HF_TOKEN is now required for HuggingFace free tier. "
-        "Get yours at https://huggingface.co/settings/tokens"
-    )
-```
-**Pros**: Keeps Llama 3.1 70B quality
-**Cons**: Friction for users, not truly "free" anymore
-### Option 3: Server-Side HF_TOKEN on Spaces
-Set `HF_TOKEN` as a secret in HuggingFace Spaces settings:
-1. Go to Space Settings → Repository Secrets
-2. Add `HF_TOKEN` with a valid token
-3. Users get free tier without needing their own token
-**Pros**: Best UX, transparent to users
-**Cons**: Token usage counted against our account
-### Option 4: Hybrid Fallback Chain
-Try multiple models in order until one works:
-```python
-FALLBACK_MODELS = [
-    "meta-llama/Llama-3.1-70B-Instruct",  # Best quality (needs token)
-    "mistralai/Mistral-7B-Instruct-v0.3",  # Good quality (free)
-    "microsoft/Phi-3-mini-4k-instruct",    # Lightweight (free)
-]
-```
-**Pros**: Graceful degradation
-**Cons**: Complexity, inconsistent output quality
-## Recommended Fix
-**Short-term (P1)**: Option 3 - Add `HF_TOKEN` to HuggingFace Spaces secrets
-**Long-term**: Option 4 - Implement fallback chain with clear user feedback about which model is active
-## Testing
 ```bash
-# Test without token (should fail currently)
-unset HF_TOKEN
 uv run python -c "
-from huggingface_hub import InferenceClient
-client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct')
-response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
-print(response)
-"
-# Test with token (should work)
-export HF_TOKEN=hf_xxxxx
-uv run python -c "
-from huggingface_hub import InferenceClient
-client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct', token='$HF_TOKEN')
-response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
-print(response)
 "
 ```
-## References
-- [HuggingFace API Migration Thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
-- [GitHub Issue: 401 Unauthorized](https://github.com/huggingface/transformers/issues/38289)
-- [HuggingFace Inference Endpoints Docs](https://huggingface.co/docs/huggingface_hub/guides/inference)
-## Update 2025-12-01 21:45 PST
-**Attempted Fix 1**: Switched model from `meta-llama/Llama-3.1-70B-Instruct` (Hyperbolic) to `Qwen/Qwen2.5-72B-Instruct` (routed to **Novita**).
-**Result**: Failed with same 401 error on Novita.
-```
-401 Client Error: Unauthorized for url: https://router.huggingface.co/novita/v3/openai/chat/completions
-Invalid username or password.
-```
-**New Findings**:
-1. **All Large Models are Partners**: Both Llama-70B and Qwen-72B are routed to partner providers (Hyperbolic, Novita).
-2. **Partners Require Auth**: Partner providers strictly require authentication. Anonymous access is blocked.
-3. **Token Propagation Failure**: Even with `HF_TOKEN` set in Spaces secrets, the `huggingface_hub` library might not be picking it up via Pydantic settings if `alias` resolution is flaky in the environment.
-4. **Possible Token Permission Issue**: The user's token might lack permissions for Partner Inference endpoints.
-**Corrective Actions**:
-1. **Robust Config Loading**: Modified `src/utils/config.py` to use `default_factory=lambda: os.environ.get("HF_TOKEN")` to guarantee environment variable reading.
-2. **Debug Logging**: Added explicit logging in `src/clients/huggingface.py` to confirming if a token is being used (masked).
-3. **Retain Qwen**: Keeping `Qwen/Qwen2.5-72B-Instruct` as it's a capable model. If auth is fixed, it should work.
-**Next Steps**:
-- Deploy these changes to debug the token loading.
-- If token is loaded but still failing, the user must generate a new `HF_TOKEN` with **"Make calls to inference endpoints"** permissions.

+# P1 Bug: HuggingFace Router 401 Unauthorized
+**Severity**: P1 (High)
+**Status**: RESOLVED
 **Discovered**: 2025-12-01
+**Resolved**: 2025-12-01
 **Reporter**: Production user via HuggingFace Spaces
 ## Symptom
 Invalid username or password.
 ```
+## Root Cause
+**The HF_TOKEN in `.env` and HuggingFace Spaces secrets was invalid/expired.**
+Token `hf_ssayg...` failed `HfApi().whoami()` verification.
+## Resolution
+1. Generated new HF_TOKEN at https://huggingface.co/settings/tokens
+2. Updated `.env` with new token: `hf_gZVBI...`
+3. Updated HuggingFace Spaces secret with same token
+4. Switched default model from `meta-llama/Llama-3.1-70B-Instruct` to `Qwen/Qwen2.5-72B-Instruct` (better reliability via HF router)
+## Verification
 ```bash
 uv run python -c "
+import os
+from huggingface_hub import InferenceClient, HfApi
+token = os.environ['HF_TOKEN']  # Your valid token from .env
+api = HfApi(token=token)
+print(f'Token valid: {api.whoami()[\"name\"]}')
+client = InferenceClient(model='Qwen/Qwen2.5-72B-Instruct', token=token)
+response = client.chat_completion(messages=[{'role': 'user', 'content': '2+2=?'}], max_tokens=10)
+print(f'Inference works: {response.choices[0].message.content}')
 "
+# Output:
+# Token valid: VibecoderMcSwaggins
+# Inference works: 4
 ```
+## Lessons Learned
+1. **First-principles debugging**: Before adding complex "fixes", verify basic assumptions (is the token actually valid?)
+2. **Token expiration**: HuggingFace tokens can expire or become invalid. Always verify with `whoami()`.
+3. **Model routing**: HuggingFace routes large models to partner providers (Hyperbolic, Novita). All require valid auth.
+## Files Changed
+- `src/utils/config.py`: Changed default model to `Qwen/Qwen2.5-72B-Instruct`
+- `src/clients/huggingface.py`: Updated fallback model reference
+- `src/agent_factory/judges.py`: Updated fallback model reference
+- `src/orchestrators/langgraph_orchestrator.py`: Updated hardcoded model
+- `CLAUDE.md`, `AGENTS.md`, `GEMINI.md`: Updated documentation

src/clients/huggingface.py CHANGED Viewed

@@ -45,27 +45,14 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
         self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         self.api_key = api_key or settings.hf_token
-        # Debug logging for auth issues
-        if self.api_key:
-            masked_key = (
-                f"{self.api_key[:4]}...{self.api_key[-4:]}" if len(self.api_key) > 8 else "***"
-            )
-            logger.info(f"HuggingFaceChatClient using explicit API token: {masked_key}")
-        else:
-            logger.warning(
-                "HuggingFaceChatClient initialized WITHOUT explicit API token "
-                "(relying on cached token or anonymous access)"
-            )
-        try:
-            self._client = InferenceClient(
-                model=self.model_id,
-                token=self.api_key,
-                timeout=kwargs.get("timeout", 120),  # Default to 120s
-            )
-        except Exception as e:
-            logger.error(f"Failed to initialize HuggingFace client: {e}")
-            raise
     def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
         """Convert framework messages to HuggingFace format."""

         self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         self.api_key = api_key or settings.hf_token
+        # Initialize the HF Inference Client
+        # timeout=60 to prevent premature timeouts on long reasonings
+        self._client = InferenceClient(
+            model=self.model_id,
+            token=self.api_key,
+            timeout=60,
+        )
+        logger.info("Initialized HuggingFaceChatClient", model=self.model_id)
     def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
         """Convert framework messages to HuggingFace format."""

src/utils/config.py CHANGED Viewed

@@ -1,7 +1,6 @@
 """Application configuration using Pydantic Settings."""
 import logging
-import os
 from typing import Literal
 import structlog
@@ -43,9 +42,7 @@ class Settings(BaseSettings):
         default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
     )
     hf_token: str | None = Field(
-        default_factory=lambda: os.environ.get("HF_TOKEN"),
-        alias="HF_TOKEN",
-        description="HuggingFace API token",
     )
     # Embedding Configuration

 """Application configuration using Pydantic Settings."""
 import logging
 from typing import Literal
 import structlog
         default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
     )
     hf_token: str | None = Field(
+        default=None, alias="HF_TOKEN", description="HuggingFace API token"
     )
     # Embedding Configuration