Multi-Document RAG System

A production-ready Retrieval-Augmented Generation (RAG) system for intelligent question-answering over multiple PDF documents. Features hybrid retrieval (vector + keyword search), cross-encoder re-ranking, semantic chunking, and a Gradio web interface.

Model Description

This system implements an advanced RAG pipeline that combines multiple state-of-the-art techniques for optimal document retrieval and question answering:

Core Models Used

Component	Model	Purpose
Embeddings	`BAAI/bge-large-en-v1.5`	1024-dim normalized embeddings for semantic search
Re-ranker	`BAAI/bge-reranker-v2-m3`	Cross-encoder neural re-ranking for precision
Chunker	`sentence-transformers/all-MiniLM-L6-v2`	Semantic similarity for intelligent chunking
LLM	Llama 3.3 70B (via Groq API)	Generation with inline citations

Architecture

User Query
    │
    ├── Query Classification (factoid/summary/comparison/extraction/reasoning)
    ├── Multi-Query Expansion (3 alternative phrasings)
    └── HyDE Generation (hypothetical answer document)
           │
           ▼
    ┌──────────────────────────────────────┐
    │         Hybrid Retrieval             │
    │  ┌─────────────┐  ┌─────────────┐    │
    │  │ ChromaDB    │  │ BM25        │    │
    │  │ (Vector)    │  │ (Keyword)   │    │
    │  └─────────────┘  └─────────────┘    │
    │           │              │           │
    │           └──────┬───────┘           │
    │                  ▼                   │
    │         RRF Fusion + Deduplication   │
    └──────────────────────────────────────┘
                       │
                       ▼
              Cross-Encoder Re-ranking
              (BAAI/bge-reranker-v2-m3)
                       │
                       ▼
              LLM Generation (Llama 3.3 70B)
              with inline source citations
                       │
                       ▼
              Answer Verification (for complex queries)

Key Features

Hybrid Retrieval

Vector Search (MMR): Semantic similarity with diversity via ChromaDB
Keyword Search (BM25): Exact term matching for rare words
Reciprocal Rank Fusion: Combines multiple ranked lists optimally

Semantic Chunking

Documents are split based on sentence embedding similarity rather than fixed character counts, preserving coherent ideas within chunks.

Intelligent Query Classification

Automatically classifies queries into 5 types with adaptive retrieval:

Query Type	Retrieval Depth (k)	Answer Style
Factoid	6	Direct
Summary	10	Bullets
Comparison	12	Bullets
Extraction	8	Direct
Reasoning	10	Steps

Multi-Document Support

Upload multiple PDFs to build a combined knowledge base
Automatic PDF diversity enforcement for cross-document queries
Clear source attribution with document name and page number

Query Enhancement

HyDE: Generates hypothetical answer documents for better retrieval
Multi-Query Expansion: Creates 3 alternative phrasings for broader coverage

Answer Verification

Self-verification step for complex queries ensures answers are direct, structured, and grounded in sources.

Intended Uses

Primary Use Cases

Academic Research: Analyze and compare research papers
Document Q&A: Answer questions over technical documentation
Literature Review: Synthesize information across multiple sources
Knowledge Extraction: Extract specific facts, methodologies, or findings

Out-of-Scope Uses

Real-time streaming applications (latency-sensitive)
Non-English documents (optimized for English)
Image/table-heavy PDFs (text extraction only)

How to Use

Requirements

Python 3.10+
Groq API key (free at console.groq.com)
GPU recommended but not required

Installation

pip install numpy==1.26.4 pandas==2.2.2 scipy==1.13.1
pip install langchain-core==0.2.40 langchain-community==0.2.16 langchain==0.2.16
pip install langchain-groq==0.1.9 langchain-text-splitters==0.2.4
pip install chromadb==0.5.5 sentence-transformers==3.0.1
pip install pypdf==4.3.1 rank-bm25==0.2.2 gradio torch

Quick Start

Open rag.ipynb in Jupyter Notebook or Google Colab
Run all cells sequentially
Enter your Groq API key in the Setup tab
Upload PDF documents
Ask questions in the Chat tab

Example Queries

# Single Document Analysis
"What is the main contribution of this paper?"
"Explain the methodology in detail"
"What are the limitations mentioned by the authors?"

# Multi-Document Comparison
"Compare the approaches discussed in these papers"
"What are the key differences between the methodologies?"

Technical Specifications

Performance Benchmarks

Operation	Typical Duration
Model initialization	30-60 seconds
PDF ingestion (per doc)	10-30 seconds
Simple queries	5-8 seconds
Complex queries	10-15 seconds
Full document summary	30-90 seconds

Configuration Parameters

Parameter	Default	Description
`max_chunk_size`	1000	Maximum characters per semantic chunk
`similarity_threshold`	0.5	Cosine similarity for chunk grouping
`chunk_size`	800	Fallback text splitter chunk size
`chunk_overlap`	150	Character overlap between chunks
`fetch_factor`	2	Multiplier for initial retrieval pool
`lambda_mult`	0.6	MMR diversity parameter
`cache_max_size`	100	Maximum cached query responses

Limitations

Requires active internet connection for Groq API calls
PDF quality affects text extraction accuracy
Large documents may take longer to process
Query cache does not persist between sessions
Optimized for English language documents

Training Details

This is a retrieval system, not a trained model. It orchestrates pre-trained models:

Embeddings: Uses pre-trained BAAI/bge-large-en-v1.5 without fine-tuning
Re-ranker: Uses pre-trained BAAI/bge-reranker-v2-m3 without fine-tuning
LLM: Uses Llama 3.3 70B via Groq API with zero-shot prompting

Evaluation

The system was evaluated qualitatively on academic papers and technical documents for:

Answer relevance and accuracy
Source attribution correctness
Cross-document comparison quality
Response structure and readability

Environmental Impact

Hardware: Developed and tested on Google Colab (NVIDIA T4 GPU)
Inference: Primary compute via Groq API (cloud-hosted)
Local model loading: ~2GB VRAM for embeddings + re-ranker

Citation

@software{multi_doc_rag_system,
  title = {Multi-Document RAG System},
  year = {2024},
  description = {Production-ready RAG system with hybrid retrieval and cross-encoder re-ranking},
  url = {https://huggingface.co/your-username/your-repo}
}

Acknowledgements

This project builds upon:

LangChain for RAG orchestration
ChromaDB for vector storage
Sentence Transformers for embeddings
BAAI for BGE models
Groq for fast LLM inference

Contact

For questions or feedback, please open an issue on the repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for goutam-dev/rag-chatbot

Base model

BAAI/bge-large-en-v1.5

Finetuned

(49)

this model