nothingworry commited on
Commit
807e3cf
Β·
1 Parent(s): 7501e7b

update the readme file

Browse files
Files changed (3) hide show
  1. README.md +60 -2
  2. backend/README.md +57 -0
  3. frontend/README.md +9 -1
README.md CHANGED
@@ -85,7 +85,11 @@ Then access:
85
  - πŸ€– **Autonomous Multi-Step MCP Agents** – Intelligent tool-aware agent that plans and executes multi-step workflows across RAG, Web, Admin, and LLM tools with short-term conversation memory
86
  - πŸ’­ **Short-Term Conversation Memory** – Automatic memory system that stores the last N tool outputs per session with configurable expiration (default: 10 outputs, 15 minutes TTL). Memory is keyed by session_id (not tenant_id) for safety, enabling better context awareness in multi-step workflows. Memory is automatically injected into tool payloads and cleared on session end.
87
  - πŸ“š **Enhanced Knowledge Base Management** – Upload raw text, URLs, or documents (PDF/DOCX/TXT/MD) with rich metadata (source URL, timestamp, document type) and optimized chunking (400-600 tokens)
88
- - πŸ” **Optimized RAG Search** – Semantic search with configurable similarity threshold (default 0.3) for better recall, with fallback to return top results even if below threshold
 
 
 
 
89
  - πŸ—‘οΈ **Document Management** – Delete individual documents or bulk delete all documents for a tenant with confirmation dialogs
90
  - πŸ›‘οΈ **Enterprise Admin Governance** – Advanced rule management system with:
91
  - Regex-based red-flag pattern matching with severity levels (low/medium/high/critical)
@@ -107,7 +111,7 @@ Then access:
107
  - 🌐 **Live Web Search** – Google Programmable Search (Custom Search API) with tenant-aware MCP tooling
108
  - 🏒 **Multi-Tenant Isolation** – Complete tenant isolation with centralized tenant ID management; backend enforces strict isolation for chat, ingestion, and admin ops
109
  - πŸ” **Fine-Grained Role-Based Access Control (RBAC)** – Four-tier role system (viewer, editor, admin, owner) with dynamic UI visibility and backend permission enforcement; frontend automatically shows/hides features based on role
110
- - πŸ”„ **Intelligent Multi-Tool Orchestration** – MCP agent orchestrator autonomously selects optimal tool chains (RAG + Web + LLM, etc.) based on query intent and context
111
  - ⚑ **Robust Error Handling** – Structured error responses, retry mechanisms, and graceful fallbacks (e.g., if RAG fails β†’ fallback to LLM-only)
112
  - πŸ“‘ **Streaming Responses** – Chat responses stream word-by-word using Server-Sent Events (SSE) for real-time user experience
113
  - 🎯 **Rule-First Processing** – Admin rules checked before intent classification - rules can trigger brief responses or block requests entirely
@@ -885,11 +889,61 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
885
  - **Session Management**: Memory can be explicitly cleared via `end_session` flag
886
  - **Comprehensive Testing**: Full test suite covering memory storage, retrieval, expiration, and multi-step workflows
887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
888
  ### UI Improvements
889
  - **Modern Drag-and-Drop**: Intuitive file upload with visual feedback
890
  - **Enhanced Status Messages**: Clear success/error messages with icons
891
  - **Refresh Button in Table**: Quick refresh directly from the Rule Set section
892
  - **Better Visual Hierarchy**: Improved spacing, colors, and layout
 
 
 
 
 
893
 
894
  ## Key Technical Features
895
 
@@ -900,6 +954,10 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
900
  - All operations validate tenant ownership before execution
901
 
902
  ### RAG Search & Retrieval
 
 
 
 
903
  - **Optimized similarity threshold** (default 0.3) for better recall of relevant documents
904
  - **Intelligent fallback** returns top result even if below threshold to ensure knowledge base content is accessible
905
  - **Pattern-based tool selection** automatically triggers RAG for admin questions, fact lookups, and internal knowledge queries
 
85
  - πŸ€– **Autonomous Multi-Step MCP Agents** – Intelligent tool-aware agent that plans and executes multi-step workflows across RAG, Web, Admin, and LLM tools with short-term conversation memory
86
  - πŸ’­ **Short-Term Conversation Memory** – Automatic memory system that stores the last N tool outputs per session with configurable expiration (default: 10 outputs, 15 minutes TTL). Memory is keyed by session_id (not tenant_id) for safety, enabling better context awareness in multi-step workflows. Memory is automatically injected into tool payloads and cleared on session end.
87
  - πŸ“š **Enhanced Knowledge Base Management** – Upload raw text, URLs, or documents (PDF/DOCX/TXT/MD) with rich metadata (source URL, timestamp, document type) and optimized chunking (400-600 tokens)
88
+ - πŸ€– **AI-Generated KB Metadata** – Automatic extraction of title, summary, tags, topics, date, and quality score during document ingestion. LLM-powered with intelligent fallback when unavailable - uses keyword extraction and pattern matching to provide useful metadata even during timeouts
89
+ - πŸ” **Optimized RAG Search with Cross-Encoder Re-ranking** – Two-stage retrieval: initial vector search followed by cross-encoder re-ranking of top candidates using `cross-encoder/ms-marco-MiniLM-L-6-v2` for massive accuracy improvement. Semantic search with configurable similarity threshold (default 0.3) for better recall
90
+ - ⚑ **Per-Tool Latency Prediction** – Agent estimates expected latency before choosing tools (RAG: 60-120ms, Web: 400-1800ms, Admin: <20ms) to optimize tool selection and choose the fastest path
91
+ - 🧠 **Context-Aware MCP Routing** – Intelligent tool selection based on previous outputs: skip web search if RAG returns high score (β‰₯0.8), skip agent reasoning for critical admin violations, skip RAG if relevant memory already available. Leads to more sophisticated behavior and higher scores
92
+ - πŸ“‹ **Tool Output Schemas** – Every tool returns strict JSON type schemas for easier debugging, cleaner reasoning, and more polished responses. Automatic schema validation and formatting
93
  - πŸ—‘οΈ **Document Management** – Delete individual documents or bulk delete all documents for a tenant with confirmation dialogs
94
  - πŸ›‘οΈ **Enterprise Admin Governance** – Advanced rule management system with:
95
  - Regex-based red-flag pattern matching with severity levels (low/medium/high/critical)
 
111
  - 🌐 **Live Web Search** – Google Programmable Search (Custom Search API) with tenant-aware MCP tooling
112
  - 🏒 **Multi-Tenant Isolation** – Complete tenant isolation with centralized tenant ID management; backend enforces strict isolation for chat, ingestion, and admin ops
113
  - πŸ” **Fine-Grained Role-Based Access Control (RBAC)** – Four-tier role system (viewer, editor, admin, owner) with dynamic UI visibility and backend permission enforcement; frontend automatically shows/hides features based on role
114
+ - πŸ”„ **Intelligent Multi-Tool Orchestration** – MCP agent orchestrator autonomously selects optimal tool chains (RAG + Web + LLM, etc.) based on query intent, context, latency predictions, and previous tool outputs. Context-aware routing enables sophisticated tool skipping for efficiency
115
  - ⚑ **Robust Error Handling** – Structured error responses, retry mechanisms, and graceful fallbacks (e.g., if RAG fails β†’ fallback to LLM-only)
116
  - πŸ“‘ **Streaming Responses** – Chat responses stream word-by-word using Server-Sent Events (SSE) for real-time user experience
117
  - 🎯 **Rule-First Processing** – Admin rules checked before intent classification - rules can trigger brief responses or block requests entirely
 
889
  - **Session Management**: Memory can be explicitly cleared via `end_session` flag
890
  - **Comprehensive Testing**: Full test suite covering memory storage, retrieval, expiration, and multi-step workflows
891
 
892
+ ### AI-Generated KB Metadata & Advanced RAG (Latest)
893
+ - **Automatic Metadata Extraction**: When ingesting documents, system auto-extracts:
894
+ - **Title**: From filename, URL, or content structure (with intelligent fallback)
895
+ - **Summary**: 2-3 sentence summary via LLM (with keyword-based fallback)
896
+ - **Tags**: 5-8 relevant tags extracted from content
897
+ - **Topics**: 3-5 main themes identified via LLM
898
+ - **Date Detection**: Multiple date formats automatically detected
899
+ - **Quality Score**: 0.0-1.0 score based on structure and completeness
900
+ - **Intelligent Fallback**: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata
901
+ - **Database Integration**: Metadata stored in JSONB column for flexible querying and enhanced RAG search
902
+ - **Migration Script**: Safe, idempotent database migration script included
903
+
904
+ ### Per-Tool Latency Prediction & Context-Aware Routing (Latest)
905
+ - **Latency Prediction**: Agent estimates expected latency before tool selection:
906
+ - RAG: 60-120ms (depends on result count)
907
+ - Web: 400-1800ms (network-dependent)
908
+ - Admin: <20ms (local regex matching)
909
+ - LLM: Variable based on model and token count
910
+ - **Path Optimization**: Agent chooses fastest tool sequence based on latency estimates
911
+ - **Context-Aware Routing**: Intelligent tool skipping based on previous outputs:
912
+ - High RAG score (β‰₯0.8) β†’ Skip web search
913
+ - Critical admin violation β†’ Skip agent reasoning, immediate block
914
+ - Relevant memory available β†’ Skip RAG, use memory instead
915
+ - **Routing Hints**: Context hints included in reasoning trace for transparency
916
+ - **Performance Impact**: Leads to more sophisticated behavior and higher scores
917
+
918
+ ### Tool Output Schemas (Latest)
919
+ - **Strict JSON Schemas**: Every tool returns validated JSON with consistent structure:
920
+ - **RAG**: `{results: [...], top_score: float, latency_ms: int}`
921
+ - **Web**: `{results: [...], latency_ms: int}`
922
+ - **Admin**: `{violations: [...], severity: str, latency_ms: int}`
923
+ - **LLM**: `{text: str, tokens_used: int, latency_ms: int}`
924
+ - **Automatic Validation**: All tool outputs validated and formatted before use
925
+ - **Easier Debugging**: Consistent structure makes debugging and monitoring simpler
926
+ - **Polished Responses**: Schema-validated outputs ensure professional appearance
927
+
928
+ ### Cross-Encoder Re-ranking (Latest)
929
+ - **Two-Stage RAG Process**:
930
+ - Initial vector search retrieves candidates
931
+ - Cross-encoder re-ranks top 10 results for accuracy
932
+ - Final filtering by threshold and limit
933
+ - **Model**: Uses `cross-encoder/ms-marco-MiniLM-L-6-v2` (very fast, production-ready)
934
+ - **Massive Accuracy Improvement**: Re-ranking significantly improves relevance of search results
935
+ - **Seamless Integration**: Works transparently with existing RAG search API
936
+
937
  ### UI Improvements
938
  - **Modern Drag-and-Drop**: Intuitive file upload with visual feedback
939
  - **Enhanced Status Messages**: Clear success/error messages with icons
940
  - **Refresh Button in Table**: Quick refresh directly from the Rule Set section
941
  - **Better Visual Hierarchy**: Improved spacing, colors, and layout
942
+ - **Gradio UI Enhancements**:
943
+ - AI metadata displayed after document ingestion
944
+ - Latency predictions shown in reasoning trace
945
+ - Context-aware routing hints visualized
946
+ - Tool output schemas displayed in debug view
947
 
948
  ## Key Technical Features
949
 
 
954
  - All operations validate tenant ownership before execution
955
 
956
  ### RAG Search & Retrieval
957
+ - **Cross-Encoder Re-ranking**: Two-stage retrieval process for massive accuracy improvement:
958
+ - First: Vector search retrieves top candidates using embeddings
959
+ - Then: Cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) re-ranks top 10 results
960
+ - Final: Results filtered by threshold and limit applied
961
  - **Optimized similarity threshold** (default 0.3) for better recall of relevant documents
962
  - **Intelligent fallback** returns top result even if below threshold to ensure knowledge base content is accessible
963
  - **Pattern-based tool selection** automatically triggers RAG for admin questions, fact lookups, and internal knowledge queries
backend/README.md CHANGED
@@ -116,6 +116,11 @@ Use the helper scripts in the repo root when validating backend changes:
116
  - Error responses include detailed messages for better debugging
117
 
118
  ### RAG Search Enhancements
 
 
 
 
 
119
  - **Lowered default threshold** from 0.5 to 0.3 for improved recall of relevant documents
120
  - **Intelligent fallback mechanism** returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
121
  - **Configurable threshold** via `threshold` parameter in search requests (default: 0.3)
@@ -132,6 +137,54 @@ Use the helper scripts in the repo root when validating backend changes:
132
  - `MCP_MEMORY_TTL_SECONDS`: Time-to-live for memory entries in seconds (default: 900)
133
  - **Comprehensive Testing**: Full test suite in `backend/tests/test_conversation_memory.py` covering storage, retrieval, expiration, and multi-step workflows
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  ### UI Enhancements (app.py)
136
  - **Knowledge Base Library Tab**:
137
  - Statistics cards showing document counts by type
@@ -152,6 +205,10 @@ Use the helper scripts in the repo root when validating backend changes:
152
  - **Debug & Reasoning Tab**:
153
  - Reasoning trace analyzer showing step-by-step agent decision-making
154
  - Tool invocation timeline with latency visualization
 
 
 
 
155
  - Formatted markdown output with detailed metrics
156
  - Uses `/agent/debug` endpoint for comprehensive insights
157
 
 
116
  - Error responses include detailed messages for better debugging
117
 
118
  ### RAG Search Enhancements
119
+ - **Cross-Encoder Re-ranking**: Two-stage retrieval process for massive accuracy improvement:
120
+ - Initial vector search retrieves top candidates using embeddings
121
+ - Cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) re-ranks top 10 results
122
+ - Final filtering by threshold and limit applied
123
+ - Seamlessly integrated with existing search API
124
  - **Lowered default threshold** from 0.5 to 0.3 for improved recall of relevant documents
125
  - **Intelligent fallback mechanism** returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
126
  - **Configurable threshold** via `threshold` parameter in search requests (default: 0.3)
 
137
  - `MCP_MEMORY_TTL_SECONDS`: Time-to-live for memory entries in seconds (default: 900)
138
  - **Comprehensive Testing**: Full test suite in `backend/tests/test_conversation_memory.py` covering storage, retrieval, expiration, and multi-step workflows
139
 
140
+ ### AI-Generated KB Metadata
141
+
142
+ When ingesting documents, the system automatically extracts rich metadata:
143
+
144
+ - **Title Extraction**: From filename, URL, or content structure (with intelligent fallback)
145
+ - **Summary Generation**: 2-3 sentence summary via LLM (with keyword-based fallback)
146
+ - **Tag Extraction**: 5-8 relevant tags extracted from content
147
+ - **Topic Identification**: 3-5 main themes identified via LLM
148
+ - **Date Detection**: Multiple date formats automatically detected
149
+ - **Quality Score**: 0.0-1.0 score based on structure and completeness
150
+
151
+ **Intelligent Fallback**: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.
152
+
153
+ **Database Integration**: Metadata stored in JSONB column (`metadata`) for flexible querying and enhanced RAG search. Migration script: `backend/scripts/migrate_add_metadata.py`.
154
+
155
+ **API Response**: Ingestion endpoints (`/rag/ingest-document`, `/rag/ingest-file`) now return `extracted_metadata` in the response.
156
+
157
+ ### Per-Tool Latency Prediction & Context-Aware Routing
158
+
159
+ The agent now uses sophisticated routing logic to optimize tool selection:
160
+
161
+ - **Latency Prediction**: Agent estimates expected latency before tool selection:
162
+ - RAG: 60-120ms (depends on result count)
163
+ - Web: 400-1800ms (network-dependent)
164
+ - Admin: <20ms (local regex matching)
165
+ - LLM: Variable based on model and token count
166
+ - **Path Optimization**: Agent chooses fastest tool sequence based on latency estimates
167
+ - **Context-Aware Routing**: Intelligent tool skipping based on previous outputs:
168
+ - High RAG score (β‰₯0.8) β†’ Skip web search
169
+ - Critical admin violation β†’ Skip agent reasoning, immediate block
170
+ - Relevant memory available β†’ Skip RAG, use memory instead
171
+ - **Routing Hints**: Context hints included in reasoning trace for transparency
172
+
173
+ **Implementation**: `backend/api/services/tool_metadata.py` defines latency estimates and routing logic. `backend/api/services/tool_selector.py` implements context-aware decisions.
174
+
175
+ ### Tool Output Schemas
176
+
177
+ Every tool now returns strict JSON schemas for consistency:
178
+
179
+ - **RAG**: `{results: [...], top_score: float, latency_ms: int}`
180
+ - **Web**: `{results: [...], latency_ms: int}`
181
+ - **Admin**: `{violations: [...], severity: str, latency_ms: int}`
182
+ - **LLM**: `{text: str, tokens_used: int, latency_ms: int}`
183
+
184
+ **Automatic Validation**: All tool outputs validated and formatted in `AgentOrchestrator` before use. Makes debugging and monitoring simpler.
185
+
186
+ **Schema Definitions**: `backend/api/services/tool_metadata.py` contains `TOOL_OUTPUT_SCHEMAS` with validation functions.
187
+
188
  ### UI Enhancements (app.py)
189
  - **Knowledge Base Library Tab**:
190
  - Statistics cards showing document counts by type
 
205
  - **Debug & Reasoning Tab**:
206
  - Reasoning trace analyzer showing step-by-step agent decision-making
207
  - Tool invocation timeline with latency visualization
208
+ - **AI metadata display** after document ingestion (title, summary, tags, topics, quality score)
209
+ - **Latency predictions** shown in reasoning trace (estimated vs actual)
210
+ - **Context-aware routing hints** visualized (skip web/RAG/reasoning decisions)
211
+ - **Tool output schemas** displayed in debug view
212
  - Formatted markdown output with detailed metrics
213
  - Uses `/agent/debug` endpoint for comprehensive insights
214
 
frontend/README.md CHANGED
@@ -55,11 +55,14 @@ The frontend includes three powerful visualization components:
55
  - Step-by-step visualization of agent decision-making
56
  - Animated progression through reasoning steps
57
  - Status indicators and detailed metrics
 
 
58
  - Integrated into chat panel with collapsible section
59
 
60
  #### 2. Tool Invocation Timeline (`tool-timeline.tsx`)
61
  - Visual timeline of tool executions
62
  - Latency and result count visualization
 
63
  - Summary statistics
64
  - Integrated into chat panel
65
 
@@ -70,7 +73,12 @@ The frontend includes three powerful visualization components:
70
 
71
  ### Knowledge Base Page (`/knowledge-base`)
72
  - **Document listing** with pagination and filtering by type (text, PDF, FAQ, link)
73
- - **Search interface** for semantic search across documents
 
 
 
 
 
74
  - **Document ingestion** with support for:
75
  - Raw text input
76
  - URL ingestion (automatic content fetching)
 
55
  - Step-by-step visualization of agent decision-making
56
  - Animated progression through reasoning steps
57
  - Status indicators and detailed metrics
58
+ - **Latency predictions** shown for each step (estimated vs actual)
59
+ - **Context-aware routing hints** displayed (skip web/RAG/reasoning decisions)
60
  - Integrated into chat panel with collapsible section
61
 
62
  #### 2. Tool Invocation Timeline (`tool-timeline.tsx`)
63
  - Visual timeline of tool executions
64
  - Latency and result count visualization
65
+ - **Schema-validated outputs** displayed (RAG results, Web results, Admin violations, LLM tokens)
66
  - Summary statistics
67
  - Integrated into chat panel
68
 
 
73
 
74
  ### Knowledge Base Page (`/knowledge-base`)
75
  - **Document listing** with pagination and filtering by type (text, PDF, FAQ, link)
76
+ - **Search interface** for semantic search with cross-encoder re-ranking across documents
77
+ - **AI-Generated Metadata Display**: After ingestion, shows extracted:
78
+ - Title, Summary, Tags, Topics
79
+ - Quality Score (0.0-1.0)
80
+ - Detected Date
81
+ - Extraction Method (LLM vs fallback)
82
  - **Document ingestion** with support for:
83
  - Raw text input
84
  - URL ingestion (automatic content fetching)