Instructions to use MetaIX/GPT4-X-Alpaca-30B-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MetaIX/GPT4-X-Alpaca-30B-4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MetaIX/GPT4-X-Alpaca-30B-4bit")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MetaIX/GPT4-X-Alpaca-30B-4bit") model = AutoModelForCausalLM.from_pretrained("MetaIX/GPT4-X-Alpaca-30B-4bit") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MetaIX/GPT4-X-Alpaca-30B-4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MetaIX/GPT4-X-Alpaca-30B-4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MetaIX/GPT4-X-Alpaca-30B-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MetaIX/GPT4-X-Alpaca-30B-4bit
- SGLang
How to use MetaIX/GPT4-X-Alpaca-30B-4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MetaIX/GPT4-X-Alpaca-30B-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MetaIX/GPT4-X-Alpaca-30B-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MetaIX/GPT4-X-Alpaca-30B-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MetaIX/GPT4-X-Alpaca-30B-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use MetaIX/GPT4-X-Alpaca-30B-4bit with Docker Model Runner:
docker model run hf.co/MetaIX/GPT4-X-Alpaca-30B-4bit
Issue of sha256sum of file gpt4-x-alpaca-30b-128g-4bit.safetensors
#16 opened over 2 years ago
by
mamsds
Size mismatch error no matter whan .json or .config files or startup parameters I use
1
#13 opened about 3 years ago
by
Anphex
Best model I tested, but seems to have an issue on some tokens
2
#12 opened about 3 years ago
by
kbrkbr
llama.cpp breaks quantized ggml file format
4
#11 opened about 3 years ago
by
Waldschrat
What model_type to set between "None" and "llama"? And what prompt style to use? There are plenty of the latter in Ooba's TextGenWebUI as of now and I'm pretty much lost (see pics for clarification)
4
#10 opened about 3 years ago
by
sneedingface
sha256sum not matching for one file gpt4-x-alpaca-30b-128g-4bit.safetensors
1
#9 opened about 3 years ago
by
spaceman7777
Error: Internal: src/sentencepiece_processor.cc in Ooba and KAI 4bit
4
#8 opened about 3 years ago
by
Co0ode
Please, help :<
17
#7 opened about 3 years ago
by
ANGIPO
Loaded the model but it wont respond and is stuck saying "typing" meanwhile gpu usage at 100%
1
#6 opened about 3 years ago
by
barncroft
Error when launching
👍 1
4
#5 opened about 3 years ago
by
pupdike
filenames of shards in pytorch_model.bin.index.json
🤝 1
2
#4 opened about 3 years ago
by
h3ndrik
Model size for int4 fine tuning on rtx 3090
3
#2 opened about 3 years ago
by
KnutJaegersberg