Instructions to use solvrays/solvrays-finetuned-pdf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use solvrays/solvrays-finetuned-pdf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="solvrays/solvrays-finetuned-pdf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("solvrays/solvrays-finetuned-pdf")
model = AutoModelForCausalLM.from_pretrained("solvrays/solvrays-finetuned-pdf")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use solvrays/solvrays-finetuned-pdf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "solvrays/solvrays-finetuned-pdf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solvrays/solvrays-finetuned-pdf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/solvrays/solvrays-finetuned-pdf

SGLang

How to use solvrays/solvrays-finetuned-pdf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "solvrays/solvrays-finetuned-pdf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solvrays/solvrays-finetuned-pdf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "solvrays/solvrays-finetuned-pdf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solvrays/solvrays-finetuned-pdf",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use solvrays/solvrays-finetuned-pdf with Docker Model Runner:
```
docker model run hf.co/solvrays/solvrays-finetuned-pdf
```

solvrays-finetuned-pdf

File size: 2,766 Bytes

9f04aef
146c055
 
9f04aef
146c055
 
60b70a5
146c055
 
c013a11
146c055
 
9f04aef
 
395d8b2
4b55b77
f68da05
 
4b55b77
f68da05
 
 
 
4b55b77
 
f68da05
4b55b77
 
60b70a5
 
9f04aef
146c055
60b70a5
f68da05
 
 
 
 
 
146c055
f68da05
 
 
 
 
 
60b70a5
4b55b77
f68da05
 
146c055
f68da05
 
 
 
 
 
 
 
 
 
 
 
4b55b77
f68da05
 
 
 
4b55b77
 
eb1f0a2

---
base_model: google/gemma-2b-it
language: en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- precision-grounding
- document-qa
- zero-hallucination
- legal-tech
- technical-analysis
---

# 📂 Solvrays Finetuned Pdf - Document AI

## 🌟 Model Overview
This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding.

### 🚀 Key Capabilities
- **Technical Grounding**: Prioritizes factual documentation over generative speculation.
- **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window).
- **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy.

## 💻 Professional Implementation
The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = 'solvrays/solvrays-finetuned-pdf'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map='auto', 
    torch_dtype=torch.bfloat16, 
    quantization_config={'load_in_4bit': True}
)

def query_model(user_query):
    # High-Precision Retrieval Template
    prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: '
    inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()
```

## 📊 Technical Specifications
| Feature | Configuration |
| :--- | :--- |
| **Base Model** | google/gemma-2b-it |
| **Precision** | BrainFloat16 (BF16) |
| **Fine-tuning** | QLoRA (4-bit Normalized Float) |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Target Modules** | q, k, v, o, gate, up, down |
| **Training Epochs** | 25 |

## 🛠 Training Environment
- **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture)
- **Optimizer**: Paged AdamW 8-bit
- **Context Length**: 256 tokens per block

## ⚠️ Constraints & Risk Mitigation
- **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst.
- **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification.
- **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material.

---
**Senior AI Architect & Developer**: Solvrays