Instructions to use oobabooga/CodeBooga-34B-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use oobabooga/CodeBooga-34B-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="oobabooga/CodeBooga-34B-v0.1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("oobabooga/CodeBooga-34B-v0.1") model = AutoModelForCausalLM.from_pretrained("oobabooga/CodeBooga-34B-v0.1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use oobabooga/CodeBooga-34B-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "oobabooga/CodeBooga-34B-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oobabooga/CodeBooga-34B-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/oobabooga/CodeBooga-34B-v0.1
- SGLang
How to use oobabooga/CodeBooga-34B-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "oobabooga/CodeBooga-34B-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oobabooga/CodeBooga-34B-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "oobabooga/CodeBooga-34B-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oobabooga/CodeBooga-34B-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use oobabooga/CodeBooga-34B-v0.1 with Docker Model Runner:
docker model run hf.co/oobabooga/CodeBooga-34B-v0.1
Merge 100% of the models instead of only parts
Is it possible to merge 100% of the models instead of only a % of each, i just dont see why you would cut off data from either model when they each were trained on diffrent coding datasets. Why not merge the entire models together. obviously meaning the adapter models of wizardcoder and phind not base codellama-34b
Both models are used. It's a weighted average with weights that change as a function of position.
gradient_values: [0.75]means that the weights are merged with 0.75 weight for model1 and 0.25 for model2.gradient_values: [0.75, 0.25]means that the ratio starts at 0.75 and ends in 0.25 for that layer.
Oh my bad i thought .75 meant 75% of the model was used and 25% of the other model when merging
This is still the best coding AI model I have come across. Are there any plans to create a new version? I suppose maybe merging with a llama 3 coding model or something else... I've tried my coding test I perform on each AI and this is the only one outside of GPT-4 of course that actually creates a working app to my specifications, doesn't repeat itself often, and explains details well. Solid work especially considering that other "newer" models don't seem to be able to pass my coding test.
Does anyone else have any other coding AI model suggestions to try out (just for fun mostly at this point). Obviously I'm going to continue to use this one for my coding assistance for the foreseeable future.