Spaces:

nothingworry
/

IntegraChat

Sleeping

App Files Files Community

nothingworry commited on 28 days ago

Commit

807e3cf

1 Parent(s): 7501e7b

update the readme file

Browse files

Files changed (3) hide show

README.md +60 -2
backend/README.md +57 -0
frontend/README.md +9 -1

README.md CHANGED Viewed

@@ -85,7 +85,11 @@ Then access:
 - 🤖 **Autonomous Multi-Step MCP Agents** – Intelligent tool-aware agent that plans and executes multi-step workflows across RAG, Web, Admin, and LLM tools with short-term conversation memory
 - 💭 **Short-Term Conversation Memory** – Automatic memory system that stores the last N tool outputs per session with configurable expiration (default: 10 outputs, 15 minutes TTL). Memory is keyed by session_id (not tenant_id) for safety, enabling better context awareness in multi-step workflows. Memory is automatically injected into tool payloads and cleared on session end.
 - 📚 **Enhanced Knowledge Base Management** – Upload raw text, URLs, or documents (PDF/DOCX/TXT/MD) with rich metadata (source URL, timestamp, document type) and optimized chunking (400-600 tokens)
-- 🔍 **Optimized RAG Search** – Semantic search with configurable similarity threshold (default 0.3) for better recall, with fallback to return top results even if below threshold
 - 🗑️ **Document Management** – Delete individual documents or bulk delete all documents for a tenant with confirmation dialogs
 - 🛡️ **Enterprise Admin Governance** – Advanced rule management system with:
   - Regex-based red-flag pattern matching with severity levels (low/medium/high/critical)
@@ -107,7 +111,7 @@ Then access:
 - 🌐 **Live Web Search** – Google Programmable Search (Custom Search API) with tenant-aware MCP tooling
 - 🏢 **Multi-Tenant Isolation** – Complete tenant isolation with centralized tenant ID management; backend enforces strict isolation for chat, ingestion, and admin ops
 - 🔐 **Fine-Grained Role-Based Access Control (RBAC)** – Four-tier role system (viewer, editor, admin, owner) with dynamic UI visibility and backend permission enforcement; frontend automatically shows/hides features based on role
-- 🔄 **Intelligent Multi-Tool Orchestration** – MCP agent orchestrator autonomously selects optimal tool chains (RAG + Web + LLM, etc.) based on query intent and context
 - ⚡ **Robust Error Handling** – Structured error responses, retry mechanisms, and graceful fallbacks (e.g., if RAG fails → fallback to LLM-only)
 - 📡 **Streaming Responses** – Chat responses stream word-by-word using Server-Sent Events (SSE) for real-time user experience
 - 🎯 **Rule-First Processing** – Admin rules checked before intent classification - rules can trigger brief responses or block requests entirely
@@ -885,11 +889,61 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
 - **Session Management**: Memory can be explicitly cleared via `end_session` flag
 - **Comprehensive Testing**: Full test suite covering memory storage, retrieval, expiration, and multi-step workflows
 ### UI Improvements
 - **Modern Drag-and-Drop**: Intuitive file upload with visual feedback
 - **Enhanced Status Messages**: Clear success/error messages with icons
 - **Refresh Button in Table**: Quick refresh directly from the Rule Set section
 - **Better Visual Hierarchy**: Improved spacing, colors, and layout
 ## Key Technical Features
@@ -900,6 +954,10 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
 - All operations validate tenant ownership before execution
 ### RAG Search & Retrieval
 - **Optimized similarity threshold** (default 0.3) for better recall of relevant documents
 - **Intelligent fallback** returns top result even if below threshold to ensure knowledge base content is accessible
 - **Pattern-based tool selection** automatically triggers RAG for admin questions, fact lookups, and internal knowledge queries

 - 🤖 **Autonomous Multi-Step MCP Agents** – Intelligent tool-aware agent that plans and executes multi-step workflows across RAG, Web, Admin, and LLM tools with short-term conversation memory
 - 💭 **Short-Term Conversation Memory** – Automatic memory system that stores the last N tool outputs per session with configurable expiration (default: 10 outputs, 15 minutes TTL). Memory is keyed by session_id (not tenant_id) for safety, enabling better context awareness in multi-step workflows. Memory is automatically injected into tool payloads and cleared on session end.
 - 📚 **Enhanced Knowledge Base Management** – Upload raw text, URLs, or documents (PDF/DOCX/TXT/MD) with rich metadata (source URL, timestamp, document type) and optimized chunking (400-600 tokens)
+- 🤖 **AI-Generated KB Metadata** – Automatic extraction of title, summary, tags, topics, date, and quality score during document ingestion. LLM-powered with intelligent fallback when unavailable - uses keyword extraction and pattern matching to provide useful metadata even during timeouts
+- 🔍 **Optimized RAG Search with Cross-Encoder Re-ranking** – Two-stage retrieval: initial vector search followed by cross-encoder re-ranking of top candidates using `cross-encoder/ms-marco-MiniLM-L-6-v2` for massive accuracy improvement. Semantic search with configurable similarity threshold (default 0.3) for better recall
+- ⚡ **Per-Tool Latency Prediction** – Agent estimates expected latency before choosing tools (RAG: 60-120ms, Web: 400-1800ms, Admin: <20ms) to optimize tool selection and choose the fastest path
+- 🧠 **Context-Aware MCP Routing** – Intelligent tool selection based on previous outputs: skip web search if RAG returns high score (≥0.8), skip agent reasoning for critical admin violations, skip RAG if relevant memory already available. Leads to more sophisticated behavior and higher scores
+- 📋 **Tool Output Schemas** – Every tool returns strict JSON type schemas for easier debugging, cleaner reasoning, and more polished responses. Automatic schema validation and formatting
 - 🗑️ **Document Management** – Delete individual documents or bulk delete all documents for a tenant with confirmation dialogs
 - 🛡️ **Enterprise Admin Governance** – Advanced rule management system with:
   - Regex-based red-flag pattern matching with severity levels (low/medium/high/critical)
 - 🌐 **Live Web Search** – Google Programmable Search (Custom Search API) with tenant-aware MCP tooling
 - 🏢 **Multi-Tenant Isolation** – Complete tenant isolation with centralized tenant ID management; backend enforces strict isolation for chat, ingestion, and admin ops
 - 🔐 **Fine-Grained Role-Based Access Control (RBAC)** – Four-tier role system (viewer, editor, admin, owner) with dynamic UI visibility and backend permission enforcement; frontend automatically shows/hides features based on role
+- 🔄 **Intelligent Multi-Tool Orchestration** – MCP agent orchestrator autonomously selects optimal tool chains (RAG + Web + LLM, etc.) based on query intent, context, latency predictions, and previous tool outputs. Context-aware routing enables sophisticated tool skipping for efficiency
 - ⚡ **Robust Error Handling** – Structured error responses, retry mechanisms, and graceful fallbacks (e.g., if RAG fails → fallback to LLM-only)
 - 📡 **Streaming Responses** – Chat responses stream word-by-word using Server-Sent Events (SSE) for real-time user experience
 - 🎯 **Rule-First Processing** – Admin rules checked before intent classification - rules can trigger brief responses or block requests entirely
 - **Session Management**: Memory can be explicitly cleared via `end_session` flag
 - **Comprehensive Testing**: Full test suite covering memory storage, retrieval, expiration, and multi-step workflows
+### AI-Generated KB Metadata & Advanced RAG (Latest)
+- **Automatic Metadata Extraction**: When ingesting documents, system auto-extracts:
+  - **Title**: From filename, URL, or content structure (with intelligent fallback)
+  - **Summary**: 2-3 sentence summary via LLM (with keyword-based fallback)
+  - **Tags**: 5-8 relevant tags extracted from content
+  - **Topics**: 3-5 main themes identified via LLM
+  - **Date Detection**: Multiple date formats automatically detected
+  - **Quality Score**: 0.0-1.0 score based on structure and completeness
+- **Intelligent Fallback**: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata
+- **Database Integration**: Metadata stored in JSONB column for flexible querying and enhanced RAG search
+- **Migration Script**: Safe, idempotent database migration script included
+### Per-Tool Latency Prediction & Context-Aware Routing (Latest)
+- **Latency Prediction**: Agent estimates expected latency before tool selection:
+  - RAG: 60-120ms (depends on result count)
+  - Web: 400-1800ms (network-dependent)
+  - Admin: <20ms (local regex matching)
+  - LLM: Variable based on model and token count
+- **Path Optimization**: Agent chooses fastest tool sequence based on latency estimates
+- **Context-Aware Routing**: Intelligent tool skipping based on previous outputs:
+  - High RAG score (≥0.8) → Skip web search
+  - Critical admin violation → Skip agent reasoning, immediate block
+  - Relevant memory available → Skip RAG, use memory instead
+- **Routing Hints**: Context hints included in reasoning trace for transparency
+- **Performance Impact**: Leads to more sophisticated behavior and higher scores
+### Tool Output Schemas (Latest)
+- **Strict JSON Schemas**: Every tool returns validated JSON with consistent structure:
+  - **RAG**: `{results: [...], top_score: float, latency_ms: int}`
+  - **Web**: `{results: [...], latency_ms: int}`
+  - **Admin**: `{violations: [...], severity: str, latency_ms: int}`
+  - **LLM**: `{text: str, tokens_used: int, latency_ms: int}`
+- **Automatic Validation**: All tool outputs validated and formatted before use
+- **Easier Debugging**: Consistent structure makes debugging and monitoring simpler
+- **Polished Responses**: Schema-validated outputs ensure professional appearance
+### Cross-Encoder Re-ranking (Latest)
+- **Two-Stage RAG Process**:
+  - Initial vector search retrieves candidates
+  - Cross-encoder re-ranks top 10 results for accuracy
+  - Final filtering by threshold and limit
+- **Model**: Uses `cross-encoder/ms-marco-MiniLM-L-6-v2` (very fast, production-ready)
+- **Massive Accuracy Improvement**: Re-ranking significantly improves relevance of search results
+- **Seamless Integration**: Works transparently with existing RAG search API
 ### UI Improvements
 - **Modern Drag-and-Drop**: Intuitive file upload with visual feedback
 - **Enhanced Status Messages**: Clear success/error messages with icons
 - **Refresh Button in Table**: Quick refresh directly from the Rule Set section
 - **Better Visual Hierarchy**: Improved spacing, colors, and layout
+- **Gradio UI Enhancements**:
+  - AI metadata displayed after document ingestion
+  - Latency predictions shown in reasoning trace
+  - Context-aware routing hints visualized
+  - Tool output schemas displayed in debug view
 ## Key Technical Features
 - All operations validate tenant ownership before execution
 ### RAG Search & Retrieval
+- **Cross-Encoder Re-ranking**: Two-stage retrieval process for massive accuracy improvement:
+  - First: Vector search retrieves top candidates using embeddings
+  - Then: Cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) re-ranks top 10 results
+  - Final: Results filtered by threshold and limit applied
 - **Optimized similarity threshold** (default 0.3) for better recall of relevant documents
 - **Intelligent fallback** returns top result even if below threshold to ensure knowledge base content is accessible
 - **Pattern-based tool selection** automatically triggers RAG for admin questions, fact lookups, and internal knowledge queries

backend/README.md CHANGED Viewed

@@ -116,6 +116,11 @@ Use the helper scripts in the repo root when validating backend changes:
 - Error responses include detailed messages for better debugging
 ### RAG Search Enhancements
 - **Lowered default threshold** from 0.5 to 0.3 for improved recall of relevant documents
 - **Intelligent fallback mechanism** returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
 - **Configurable threshold** via `threshold` parameter in search requests (default: 0.3)
@@ -132,6 +137,54 @@ Use the helper scripts in the repo root when validating backend changes:
   - `MCP_MEMORY_TTL_SECONDS`: Time-to-live for memory entries in seconds (default: 900)
 - **Comprehensive Testing**: Full test suite in `backend/tests/test_conversation_memory.py` covering storage, retrieval, expiration, and multi-step workflows
 ### UI Enhancements (app.py)
 - **Knowledge Base Library Tab**:
   - Statistics cards showing document counts by type
@@ -152,6 +205,10 @@ Use the helper scripts in the repo root when validating backend changes:
 - **Debug & Reasoning Tab**:
   - Reasoning trace analyzer showing step-by-step agent decision-making
   - Tool invocation timeline with latency visualization
   - Formatted markdown output with detailed metrics
   - Uses `/agent/debug` endpoint for comprehensive insights

 - Error responses include detailed messages for better debugging
 ### RAG Search Enhancements
+- **Cross-Encoder Re-ranking**: Two-stage retrieval process for massive accuracy improvement:
+  - Initial vector search retrieves top candidates using embeddings
+  - Cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) re-ranks top 10 results
+  - Final filtering by threshold and limit applied
+  - Seamlessly integrated with existing search API
 - **Lowered default threshold** from 0.5 to 0.3 for improved recall of relevant documents
 - **Intelligent fallback mechanism** returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
 - **Configurable threshold** via `threshold` parameter in search requests (default: 0.3)
   - `MCP_MEMORY_TTL_SECONDS`: Time-to-live for memory entries in seconds (default: 900)
 - **Comprehensive Testing**: Full test suite in `backend/tests/test_conversation_memory.py` covering storage, retrieval, expiration, and multi-step workflows
+### AI-Generated KB Metadata
+When ingesting documents, the system automatically extracts rich metadata:
+- **Title Extraction**: From filename, URL, or content structure (with intelligent fallback)
+- **Summary Generation**: 2-3 sentence summary via LLM (with keyword-based fallback)
+- **Tag Extraction**: 5-8 relevant tags extracted from content
+- **Topic Identification**: 3-5 main themes identified via LLM
+- **Date Detection**: Multiple date formats automatically detected
+- **Quality Score**: 0.0-1.0 score based on structure and completeness
+**Intelligent Fallback**: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.
+**Database Integration**: Metadata stored in JSONB column (`metadata`) for flexible querying and enhanced RAG search. Migration script: `backend/scripts/migrate_add_metadata.py`.
+**API Response**: Ingestion endpoints (`/rag/ingest-document`, `/rag/ingest-file`) now return `extracted_metadata` in the response.
+### Per-Tool Latency Prediction & Context-Aware Routing
+The agent now uses sophisticated routing logic to optimize tool selection:
+- **Latency Prediction**: Agent estimates expected latency before tool selection:
+  - RAG: 60-120ms (depends on result count)
+  - Web: 400-1800ms (network-dependent)
+  - Admin: <20ms (local regex matching)
+  - LLM: Variable based on model and token count
+- **Path Optimization**: Agent chooses fastest tool sequence based on latency estimates
+- **Context-Aware Routing**: Intelligent tool skipping based on previous outputs:
+  - High RAG score (≥0.8) → Skip web search
+  - Critical admin violation → Skip agent reasoning, immediate block
+  - Relevant memory available → Skip RAG, use memory instead
+- **Routing Hints**: Context hints included in reasoning trace for transparency
+**Implementation**: `backend/api/services/tool_metadata.py` defines latency estimates and routing logic. `backend/api/services/tool_selector.py` implements context-aware decisions.
+### Tool Output Schemas
+Every tool now returns strict JSON schemas for consistency:
+- **RAG**: `{results: [...], top_score: float, latency_ms: int}`
+- **Web**: `{results: [...], latency_ms: int}`
+- **Admin**: `{violations: [...], severity: str, latency_ms: int}`
+- **LLM**: `{text: str, tokens_used: int, latency_ms: int}`
+**Automatic Validation**: All tool outputs validated and formatted in `AgentOrchestrator` before use. Makes debugging and monitoring simpler.
+**Schema Definitions**: `backend/api/services/tool_metadata.py` contains `TOOL_OUTPUT_SCHEMAS` with validation functions.
 ### UI Enhancements (app.py)
 - **Knowledge Base Library Tab**:
   - Statistics cards showing document counts by type
 - **Debug & Reasoning Tab**:
   - Reasoning trace analyzer showing step-by-step agent decision-making
   - Tool invocation timeline with latency visualization
+  - **AI metadata display** after document ingestion (title, summary, tags, topics, quality score)
+  - **Latency predictions** shown in reasoning trace (estimated vs actual)
+  - **Context-aware routing hints** visualized (skip web/RAG/reasoning decisions)
+  - **Tool output schemas** displayed in debug view
   - Formatted markdown output with detailed metrics
   - Uses `/agent/debug` endpoint for comprehensive insights

frontend/README.md CHANGED Viewed

@@ -55,11 +55,14 @@ The frontend includes three powerful visualization components:
 - Step-by-step visualization of agent decision-making
 - Animated progression through reasoning steps
 - Status indicators and detailed metrics
 - Integrated into chat panel with collapsible section
 #### 2. Tool Invocation Timeline (`tool-timeline.tsx`)
 - Visual timeline of tool executions
 - Latency and result count visualization
 - Summary statistics
 - Integrated into chat panel
@@ -70,7 +73,12 @@ The frontend includes three powerful visualization components:
 ### Knowledge Base Page (`/knowledge-base`)
 - **Document listing** with pagination and filtering by type (text, PDF, FAQ, link)
-- **Search interface** for semantic search across documents
 - **Document ingestion** with support for:
   - Raw text input
   - URL ingestion (automatic content fetching)

 - Step-by-step visualization of agent decision-making
 - Animated progression through reasoning steps
 - Status indicators and detailed metrics
+- **Latency predictions** shown for each step (estimated vs actual)
+- **Context-aware routing hints** displayed (skip web/RAG/reasoning decisions)
 - Integrated into chat panel with collapsible section
 #### 2. Tool Invocation Timeline (`tool-timeline.tsx`)
 - Visual timeline of tool executions
 - Latency and result count visualization
+- **Schema-validated outputs** displayed (RAG results, Web results, Admin violations, LLM tokens)
 - Summary statistics
 - Integrated into chat panel
 ### Knowledge Base Page (`/knowledge-base`)
 - **Document listing** with pagination and filtering by type (text, PDF, FAQ, link)
+- **Search interface** for semantic search with cross-encoder re-ranking across documents
+- **AI-Generated Metadata Display**: After ingestion, shows extracted:
+  - Title, Summary, Tags, Topics
+  - Quality Score (0.0-1.0)
+  - Detected Date
+  - Extraction Method (LLM vs fallback)
 - **Document ingestion** with support for:
   - Raw text input
   - URL ingestion (automatic content fetching)