Spaces:
Running
Running
Joseph Pollack
adds interface fixes, sidebar settings , oauth fixes , more graphs , the determinator , and more
12b7aab
unverified
| # Contributing to The DETERMINATOR | |
| Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started. | |
| ## Table of Contents | |
| - [Git Workflow](#git-workflow) | |
| - [Getting Started](#getting-started) | |
| - [Development Commands](#development-commands) | |
| - [Code Style & Conventions](#code-style--conventions) | |
| - [Type Safety](#type-safety) | |
| - [Error Handling & Logging](#error-handling--logging) | |
| - [Testing Requirements](#testing-requirements) | |
| - [Implementation Patterns](#implementation-patterns) | |
| - [Code Quality & Documentation](#code-quality--documentation) | |
| - [Prompt Engineering & Citation Validation](#prompt-engineering--citation-validation) | |
| - [MCP Integration](#mcp-integration) | |
| - [Common Pitfalls](#common-pitfalls) | |
| - [Key Principles](#key-principles) | |
| - [Pull Request Process](#pull-request-process) | |
| ## Git Workflow | |
| - `main`: Production-ready (GitHub) | |
| - `dev`: Development integration (GitHub) | |
| - Use feature branches: `yourname-dev` | |
| - **NEVER** push directly to `main` or `dev` on HuggingFace | |
| - GitHub is source of truth; HuggingFace is for deployment | |
| ## Getting Started | |
| 1. **Fork the repository** on GitHub | |
| 2. **Clone your fork**: | |
| ```bash | |
| git clone https://github.com/yourusername/GradioDemo.git | |
| cd GradioDemo | |
| ``` | |
| 3. **Install dependencies**: | |
| ```bash | |
| make install | |
| ``` | |
| 4. **Create a feature branch**: | |
| ```bash | |
| git checkout -b yourname-feature-name | |
| ``` | |
| 5. **Make your changes** following the guidelines below | |
| 6. **Run checks**: | |
| ```bash | |
| make check | |
| ``` | |
| 7. **Commit and push**: | |
| ```bash | |
| git commit -m "Description of changes" | |
| git push origin yourname-feature-name | |
| ``` | |
| 8. **Create a pull request** on GitHub | |
| ## Development Commands | |
| ```bash | |
| make install # Install dependencies + pre-commit | |
| make check # Lint + typecheck + test (MUST PASS) | |
| make test # Run unit tests | |
| make lint # Run ruff | |
| make format # Format with ruff | |
| make typecheck # Run mypy | |
| make test-cov # Test with coverage | |
| make docs-build # Build documentation | |
| make docs-serve # Serve documentation locally | |
| ``` | |
| ## Code Style & Conventions | |
| ### Type Safety | |
| - **ALWAYS** use type hints for all function parameters and return types | |
| - Use `mypy --strict` compliance (no `Any` unless absolutely necessary) | |
| - Use `TYPE_CHECKING` imports for circular dependencies: | |
| ```python | |
| from typing import TYPE_CHECKING | |
| if TYPE_CHECKING: | |
| from src.services.embeddings import EmbeddingService | |
| ``` | |
| ### Pydantic Models | |
| - All data exchange uses Pydantic models (`src/utils/models.py`) | |
| - Models are frozen (`model_config = {"frozen": True}`) for immutability | |
| - Use `Field()` with descriptions for all model fields | |
| - Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints | |
| ### Async Patterns | |
| - **ALL** I/O operations must be async (`async def`, `await`) | |
| - Use `asyncio.gather()` for parallel operations | |
| - CPU-bound work (embeddings, parsing) must use `run_in_executor()`: | |
| ```python | |
| loop = asyncio.get_running_loop() | |
| result = await loop.run_in_executor(None, cpu_bound_function, args) | |
| ``` | |
| - Never block the event loop with synchronous I/O | |
| ### Linting | |
| - Ruff with 100-char line length | |
| - Ignore rules documented in `pyproject.toml`: | |
| - `PLR0913`: Too many arguments (agents need many params) | |
| - `PLR0912`: Too many branches (complex orchestrator logic) | |
| - `PLR0911`: Too many return statements (complex agent logic) | |
| - `PLR2004`: Magic values (statistical constants) | |
| - `PLW0603`: Global statement (singleton pattern) | |
| - `PLC0415`: Lazy imports for optional dependencies | |
| ### Pre-commit | |
| - Run `make check` before committing | |
| - Must pass: lint + typecheck + test-cov | |
| - Pre-commit hooks installed via `make install` | |
| - **CRITICAL**: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind | |
| ## Error Handling & Logging | |
| ### Exception Hierarchy | |
| Use custom exception hierarchy (`src/utils/exceptions.py`): | |
| - `DeepCriticalError` (base) | |
| - `SearchError` β `RateLimitError` | |
| - `JudgeError` | |
| - `ConfigurationError` | |
| ### Error Handling Rules | |
| - Always chain exceptions: `raise SearchError(...) from e` | |
| - Log errors with context using `structlog`: | |
| ```python | |
| logger.error("Operation failed", error=str(e), context=value) | |
| ``` | |
| - Never silently swallow exceptions | |
| - Provide actionable error messages | |
| ### Logging | |
| - Use `structlog` for all logging (NOT `print` or `logging`) | |
| - Import: `import structlog; logger = structlog.get_logger()` | |
| - Log with structured data: `logger.info("event", key=value)` | |
| - Use appropriate levels: DEBUG, INFO, WARNING, ERROR | |
| ### Logging Examples | |
| ```python | |
| logger.info("Starting search", query=query, tools=[t.name for t in tools]) | |
| logger.warning("Search tool failed", tool=tool.name, error=str(result)) | |
| logger.error("Assessment failed", error=str(e)) | |
| ``` | |
| ### Error Chaining | |
| Always preserve exception context: | |
| ```python | |
| try: | |
| result = await api_call() | |
| except httpx.HTTPError as e: | |
| raise SearchError(f"API call failed: {e}") from e | |
| ``` | |
| ## Testing Requirements | |
| ### Test Structure | |
| - Unit tests in `tests/unit/` (mocked, fast) | |
| - Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`) | |
| - Use markers: `unit`, `integration`, `slow` | |
| ### Mocking | |
| - Use `respx` for httpx mocking | |
| - Use `pytest-mock` for general mocking | |
| - Mock LLM calls in unit tests (use `MockJudgeHandler`) | |
| - Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response` | |
| ### TDD Workflow | |
| 1. Write failing test in `tests/unit/` | |
| 2. Implement in `src/` | |
| 3. Ensure test passes | |
| 4. Run `make check` (lint + typecheck + test) | |
| ### Test Examples | |
| ```python | |
| @pytest.mark.unit | |
| async def test_pubmed_search(mock_httpx_client): | |
| tool = PubMedTool() | |
| results = await tool.search("metformin", max_results=5) | |
| assert len(results) > 0 | |
| assert all(isinstance(r, Evidence) for r in results) | |
| @pytest.mark.integration | |
| async def test_real_pubmed_search(): | |
| tool = PubMedTool() | |
| results = await tool.search("metformin", max_results=3) | |
| assert len(results) <= 3 | |
| ``` | |
| ### Test Coverage | |
| - Run `make test-cov` for coverage report | |
| - Aim for >80% coverage on critical paths | |
| - Exclude: `__init__.py`, `TYPE_CHECKING` blocks | |
| ## Implementation Patterns | |
| ### Search Tools | |
| All tools implement `SearchTool` protocol (`src/tools/base.py`): | |
| - Must have `name` property | |
| - Must implement `async def search(query, max_results) -> list[Evidence]` | |
| - Use `@retry` decorator from tenacity for resilience | |
| - Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed) | |
| - Error handling: Raise `SearchError` or `RateLimitError` on failures | |
| Example pattern: | |
| ```python | |
| class MySearchTool: | |
| @property | |
| def name(self) -> str: | |
| return "mytool" | |
| @retry(stop=stop_after_attempt(3), wait=wait_exponential(...)) | |
| async def search(self, query: str, max_results: int = 10) -> list[Evidence]: | |
| # Implementation | |
| return evidence_list | |
| ``` | |
| ### Judge Handlers | |
| - Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`) | |
| - Use pydantic-ai `Agent` with `output_type=JudgeAssessment` | |
| - System prompts in `src/prompts/judge.py` | |
| - Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler` | |
| - Always return valid `JudgeAssessment` (never raise exceptions) | |
| ### Agent Factory Pattern | |
| - Use factory functions for creating agents (`src/agent_factory/`) | |
| - Lazy initialization for optional dependencies (e.g., embeddings, Modal) | |
| - Check requirements before initialization: | |
| ```python | |
| def check_magentic_requirements() -> None: | |
| if not settings.has_openai_key: | |
| raise ConfigurationError("Magentic requires OpenAI") | |
| ``` | |
| ### State Management | |
| - **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`) | |
| - **Simple Mode**: Pass state via function parameters | |
| - Never use global mutable state (except singletons via `@lru_cache`) | |
| ### Singleton Pattern | |
| Use `@lru_cache(maxsize=1)` for singletons: | |
| ```python | |
| @lru_cache(maxsize=1) | |
| def get_embedding_service() -> EmbeddingService: | |
| return EmbeddingService() | |
| ``` | |
| - Lazy initialization to avoid requiring dependencies at import time | |
| ## Code Quality & Documentation | |
| ### Docstrings | |
| - Google-style docstrings for all public functions | |
| - Include Args, Returns, Raises sections | |
| - Use type hints in docstrings only if needed for clarity | |
| Example: | |
| ```python | |
| async def search(self, query: str, max_results: int = 10) -> list[Evidence]: | |
| """Search PubMed and return evidence. | |
| Args: | |
| query: The search query string | |
| max_results: Maximum number of results to return | |
| Returns: | |
| List of Evidence objects | |
| Raises: | |
| SearchError: If the search fails | |
| RateLimitError: If we hit rate limits | |
| """ | |
| ``` | |
| ### Code Comments | |
| - Explain WHY, not WHAT | |
| - Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials) | |
| - Mark critical sections: `# CRITICAL: ...` | |
| - Document rate limiting rationale | |
| - Explain async patterns when non-obvious | |
| ## Prompt Engineering & Citation Validation | |
| ### Judge Prompts | |
| - System prompt in `src/prompts/judge.py` | |
| - Format evidence with truncation (1500 chars per item) | |
| - Handle empty evidence case separately | |
| - Always request structured JSON output | |
| - Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers | |
| ### Hypothesis Prompts | |
| - Use diverse evidence selection (MMR algorithm) | |
| - Sentence-aware truncation (`truncate_at_sentence()`) | |
| - Format: Drug β Target β Pathway β Effect | |
| - System prompt emphasizes mechanistic reasoning | |
| - Use `format_hypothesis_prompt()` with embeddings for diversity | |
| ### Report Prompts | |
| - Include full citation details for validation | |
| - Use diverse evidence selection (n=20) | |
| - **CRITICAL**: Emphasize citation validation rules | |
| - Format hypotheses with support/contradiction counts | |
| - System prompt includes explicit JSON structure requirements | |
| ### Citation Validation | |
| - **ALWAYS** validate references before returning reports | |
| - Use `validate_references()` from `src/utils/citation_validator.py` | |
| - Remove hallucinated citations (URLs not in evidence) | |
| - Log warnings for removed citations | |
| - Never trust LLM-generated citations without validation | |
| ### Citation Validation Rules | |
| 1. Every reference URL must EXACTLY match a provided evidence URL | |
| 2. Do NOT invent, fabricate, or hallucinate any references | |
| 3. Do NOT modify paper titles, authors, dates, or URLs | |
| 4. If unsure about a citation, OMIT it rather than guess | |
| 5. Copy URLs exactly as provided - do not create similar-looking URLs | |
| ### Evidence Selection | |
| - Use `select_diverse_evidence()` for MMR-based selection | |
| - Balance relevance vs diversity (lambda=0.7 default) | |
| - Sentence-aware truncation preserves meaning | |
| - Limit evidence per prompt to avoid context overflow | |
| ## MCP Integration | |
| ### MCP Tools | |
| - Functions in `src/mcp_tools.py` for Claude Desktop | |
| - Full type hints required | |
| - Google-style docstrings with Args/Returns sections | |
| - Formatted string returns (markdown) | |
| ### Gradio MCP Server | |
| - Enable with `mcp_server=True` in `demo.launch()` | |
| - Endpoint: `/gradio_api/mcp/` | |
| - Use `ssr_mode=False` to fix hydration issues in HF Spaces | |
| ## Common Pitfalls | |
| 1. **Blocking the event loop**: Never use sync I/O in async functions | |
| 2. **Missing type hints**: All functions must have complete type annotations | |
| 3. **Hallucinated citations**: Always validate references | |
| 4. **Global mutable state**: Use ContextVar or pass via parameters | |
| 5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings) | |
| 6. **Rate limiting**: Always implement for external APIs | |
| 7. **Error chaining**: Always use `from e` when raising exceptions | |
| ## Key Principles | |
| 1. **Type Safety First**: All code must pass `mypy --strict` | |
| 2. **Async Everything**: All I/O must be async | |
| 3. **Test-Driven**: Write tests before implementation | |
| 4. **No Hallucinations**: Validate all citations | |
| 5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys | |
| 6. **Lazy Loading**: Don't require optional dependencies at import time | |
| 7. **Structured Logging**: Use structlog, never print() | |
| 8. **Error Chaining**: Always preserve exception context | |
| ## Pull Request Process | |
| 1. Ensure all checks pass: `make check` | |
| 2. Update documentation if needed | |
| 3. Add tests for new features | |
| 4. Update CHANGELOG if applicable | |
| 5. Request review from maintainers | |
| 6. Address review feedback | |
| 7. Wait for approval before merging | |
| ## Questions? | |
| - Open an issue on GitHub | |
| - Check existing documentation | |
| - Review code examples in the codebase | |
| Thank you for contributing to DeepCritical! | |