Text Generation
Transformers
Safetensors
English
gemma
precision-grounding
document-qa
zero-hallucination
legal-tech
technical-analysis
conversational
text-generation-inference
Instructions to use solvrays/solvrays-finetuned-pdf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use solvrays/solvrays-finetuned-pdf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="solvrays/solvrays-finetuned-pdf") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("solvrays/solvrays-finetuned-pdf") model = AutoModelForCausalLM.from_pretrained("solvrays/solvrays-finetuned-pdf") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use solvrays/solvrays-finetuned-pdf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "solvrays/solvrays-finetuned-pdf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "solvrays/solvrays-finetuned-pdf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/solvrays/solvrays-finetuned-pdf
- SGLang
How to use solvrays/solvrays-finetuned-pdf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "solvrays/solvrays-finetuned-pdf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "solvrays/solvrays-finetuned-pdf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "solvrays/solvrays-finetuned-pdf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "solvrays/solvrays-finetuned-pdf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use solvrays/solvrays-finetuned-pdf with Docker Model Runner:
docker model run hf.co/solvrays/solvrays-finetuned-pdf
File size: 2,766 Bytes
9f04aef 146c055 9f04aef 146c055 60b70a5 146c055 c013a11 146c055 9f04aef 395d8b2 4b55b77 f68da05 4b55b77 f68da05 4b55b77 f68da05 4b55b77 60b70a5 9f04aef 146c055 60b70a5 f68da05 146c055 f68da05 60b70a5 4b55b77 f68da05 146c055 f68da05 4b55b77 f68da05 4b55b77 eb1f0a2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ---
base_model: google/gemma-2b-it
language: en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- precision-grounding
- document-qa
- zero-hallucination
- legal-tech
- technical-analysis
---
# π Solvrays Finetuned Pdf - Document AI
## π Model Overview
This model is a high-precision fine-tuning of **google/gemma-2b-it**, specifically architected for **Zero-Hallucination Technical Retrieval**. It has been trained on a proprietary dataset of technical and architectural documentation to ensure deep contextual grounding.
### π Key Capabilities
- **Technical Grounding**: Prioritizes factual documentation over generative speculation.
- **Chunk-Aware Memory**: Optimized for overlapping document segments (256-token window).
- **Deterministic Precision**: Best used with `do_sample=False` for architectural accuracy.
## π» Professional Implementation
The model requires specific prompt construction to trigger its 'Knowledge Retrieval' mode:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = 'solvrays/solvrays-finetuned-pdf'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map='auto',
torch_dtype=torch.bfloat16,
quantization_config={'load_in_4bit': True}
)
def query_model(user_query):
# High-Precision Retrieval Template
prompt = f'### Knowledge Retrieval Content: {user_query}\n### Verified Response: '
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
return tokenizer.decode(outputs[0], skip_special_tokens=True).split('### Verified Response:')[-1].strip()
```
## π Technical Specifications
| Feature | Configuration |
| :--- | :--- |
| **Base Model** | google/gemma-2b-it |
| **Precision** | BrainFloat16 (BF16) |
| **Fine-tuning** | QLoRA (4-bit Normalized Float) |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 32 |
| **Target Modules** | q, k, v, o, gate, up, down |
| **Training Epochs** | 25 |
## π Training Environment
- **Hardware**: NVIDIA L4 x 2 (Dual GPU Architecture)
- **Optimizer**: Paged AdamW 8-bit
- **Context Length**: 256 tokens per block
## β οΈ Constraints & Risk Mitigation
- **Out-of-Scope**: This model is not intended for general conversation or creative writing. It is a specialized document analyst.
- **Hallucination Control**: If information is not present in the internal weights, the model is trained to state 'Not Documented' or provide an empty response for verification.
- **Numerical Accuracy**: Always cross-verify critical measurements with original PDF source material.
---
**Senior AI Architect & Developer**: Solvrays |