Instructions to use openbmb/RLAIF-V-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/RLAIF-V-12B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/RLAIF-V-12B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("openbmb/RLAIF-V-12B", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/RLAIF-V-12B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/RLAIF-V-12B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/RLAIF-V-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/RLAIF-V-12B

SGLang

How to use openbmb/RLAIF-V-12B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/RLAIF-V-12B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/RLAIF-V-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/RLAIF-V-12B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/RLAIF-V-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/RLAIF-V-12B with Docker Model Runner:
```
docker model run hf.co/openbmb/RLAIF-V-12B
```

Why is GPT4V exceeding high on LLaVA bench?

by Yhyu13 - opened May 25, 2024

Discussion

Yhyu13

May 25, 2024

Hi,

https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/ypXZxb4HE-jDPJU9115bi.png
from this pic in your paper, GPT4V is 90+ score on LLaVA bench which extradinarily greater than another models?

What could be potential reason for such anomoly?

Thanks!

Yirany

OpenBMB org May 26, 2024

Hi Yhyu13, thank you for your interest and such a good question! I guess the potential reason can be fourfold:

GPT-4V outputs are generally much longer than outputs from other models. Specifically, the average response length on the LLaVA Bench of GPT-4V, MiniGemini 34B, and RLAIF-V-7B are 181, 124, and 110 words.
GPT-4V, inheriting the strong text generation capability of GPT-4, can generate more well-organized text compared with other models.
GPT-4 prefers its own text style, thus resulting higher evaluation score.
GPT-4 prefers long answers, maybe partially caused by the above reason.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment