Multi-Document RAG System
A production-ready Retrieval-Augmented Generation (RAG) system for intelligent question-answering over multiple PDF documents. Features hybrid retrieval (vector + keyword search), cross-encoder re-ranking, semantic chunking, and a Gradio web interface.
Model Description
This system implements an advanced RAG pipeline that combines multiple state-of-the-art techniques for optimal document retrieval and question answering:
Core Models Used
| Component | Model | Purpose |
|---|---|---|
| Embeddings | BAAI/bge-large-en-v1.5 |
1024-dim normalized embeddings for semantic search |
| Re-ranker | BAAI/bge-reranker-v2-m3 |
Cross-encoder neural re-ranking for precision |
| Chunker | sentence-transformers/all-MiniLM-L6-v2 |
Semantic similarity for intelligent chunking |
| LLM | Llama 3.3 70B (via Groq API) | Generation with inline citations |
Architecture
User Query
│
├── Query Classification (factoid/summary/comparison/extraction/reasoning)
├── Multi-Query Expansion (3 alternative phrasings)
└── HyDE Generation (hypothetical answer document)
│
▼
┌──────────────────────────────────────┐
│ Hybrid Retrieval │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ ChromaDB │ │ BM25 │ │
│ │ (Vector) │ │ (Keyword) │ │
│ └─────────────┘ └─────────────┘ │
│ │ │ │
│ └──────┬───────┘ │
│ ▼ │
│ RRF Fusion + Deduplication │
└──────────────────────────────────────┘
│
▼
Cross-Encoder Re-ranking
(BAAI/bge-reranker-v2-m3)
│
▼
LLM Generation (Llama 3.3 70B)
with inline source citations
│
▼
Answer Verification (for complex queries)
Key Features
Hybrid Retrieval
- Vector Search (MMR): Semantic similarity with diversity via ChromaDB
- Keyword Search (BM25): Exact term matching for rare words
- Reciprocal Rank Fusion: Combines multiple ranked lists optimally
Semantic Chunking
Documents are split based on sentence embedding similarity rather than fixed character counts, preserving coherent ideas within chunks.
Intelligent Query Classification
Automatically classifies queries into 5 types with adaptive retrieval:
| Query Type | Retrieval Depth (k) | Answer Style |
|---|---|---|
| Factoid | 6 | Direct |
| Summary | 10 | Bullets |
| Comparison | 12 | Bullets |
| Extraction | 8 | Direct |
| Reasoning | 10 | Steps |
Multi-Document Support
- Upload multiple PDFs to build a combined knowledge base
- Automatic PDF diversity enforcement for cross-document queries
- Clear source attribution with document name and page number
Query Enhancement
- HyDE: Generates hypothetical answer documents for better retrieval
- Multi-Query Expansion: Creates 3 alternative phrasings for broader coverage
Answer Verification
Self-verification step for complex queries ensures answers are direct, structured, and grounded in sources.
Intended Uses
Primary Use Cases
- Academic Research: Analyze and compare research papers
- Document Q&A: Answer questions over technical documentation
- Literature Review: Synthesize information across multiple sources
- Knowledge Extraction: Extract specific facts, methodologies, or findings
Out-of-Scope Uses
- Real-time streaming applications (latency-sensitive)
- Non-English documents (optimized for English)
- Image/table-heavy PDFs (text extraction only)
How to Use
Requirements
- Python 3.10+
- Groq API key (free at console.groq.com)
- GPU recommended but not required
Installation
pip install numpy==1.26.4 pandas==2.2.2 scipy==1.13.1
pip install langchain-core==0.2.40 langchain-community==0.2.16 langchain==0.2.16
pip install langchain-groq==0.1.9 langchain-text-splitters==0.2.4
pip install chromadb==0.5.5 sentence-transformers==3.0.1
pip install pypdf==4.3.1 rank-bm25==0.2.2 gradio torch
Quick Start
- Open
rag.ipynbin Jupyter Notebook or Google Colab - Run all cells sequentially
- Enter your Groq API key in the Setup tab
- Upload PDF documents
- Ask questions in the Chat tab
Example Queries
# Single Document Analysis
"What is the main contribution of this paper?"
"Explain the methodology in detail"
"What are the limitations mentioned by the authors?"
# Multi-Document Comparison
"Compare the approaches discussed in these papers"
"What are the key differences between the methodologies?"
Technical Specifications
Performance Benchmarks
| Operation | Typical Duration |
|---|---|
| Model initialization | 30-60 seconds |
| PDF ingestion (per doc) | 10-30 seconds |
| Simple queries | 5-8 seconds |
| Complex queries | 10-15 seconds |
| Full document summary | 30-90 seconds |
Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
max_chunk_size |
1000 | Maximum characters per semantic chunk |
similarity_threshold |
0.5 | Cosine similarity for chunk grouping |
chunk_size |
800 | Fallback text splitter chunk size |
chunk_overlap |
150 | Character overlap between chunks |
fetch_factor |
2 | Multiplier for initial retrieval pool |
lambda_mult |
0.6 | MMR diversity parameter |
cache_max_size |
100 | Maximum cached query responses |
Limitations
- Requires active internet connection for Groq API calls
- PDF quality affects text extraction accuracy
- Large documents may take longer to process
- Query cache does not persist between sessions
- Optimized for English language documents
Training Details
This is a retrieval system, not a trained model. It orchestrates pre-trained models:
- Embeddings: Uses pre-trained
BAAI/bge-large-en-v1.5without fine-tuning - Re-ranker: Uses pre-trained
BAAI/bge-reranker-v2-m3without fine-tuning - LLM: Uses Llama 3.3 70B via Groq API with zero-shot prompting
Evaluation
The system was evaluated qualitatively on academic papers and technical documents for:
- Answer relevance and accuracy
- Source attribution correctness
- Cross-document comparison quality
- Response structure and readability
Environmental Impact
- Hardware: Developed and tested on Google Colab (NVIDIA T4 GPU)
- Inference: Primary compute via Groq API (cloud-hosted)
- Local model loading: ~2GB VRAM for embeddings + re-ranker
Citation
@software{multi_doc_rag_system,
title = {Multi-Document RAG System},
year = {2024},
description = {Production-ready RAG system with hybrid retrieval and cross-encoder re-ranking},
url = {https://huggingface.co/your-username/your-repo}
}
Acknowledgements
This project builds upon:
- LangChain for RAG orchestration
- ChromaDB for vector storage
- Sentence Transformers for embeddings
- BAAI for BGE models
- Groq for fast LLM inference
Contact
For questions or feedback, please open an issue on the repository.
Model tree for goutam-dev/rag-chatbot
Base model
BAAI/bge-large-en-v1.5