Instructions to use dphn/dolphin-2_2-yi-34b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dphn/dolphin-2_2-yi-34b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dphn/dolphin-2_2-yi-34b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dphn/dolphin-2_2-yi-34b") model = AutoModelForCausalLM.from_pretrained("dphn/dolphin-2_2-yi-34b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dphn/dolphin-2_2-yi-34b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dphn/dolphin-2_2-yi-34b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/dolphin-2_2-yi-34b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dphn/dolphin-2_2-yi-34b
- SGLang
How to use dphn/dolphin-2_2-yi-34b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dphn/dolphin-2_2-yi-34b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/dolphin-2_2-yi-34b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dphn/dolphin-2_2-yi-34b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/dolphin-2_2-yi-34b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dphn/dolphin-2_2-yi-34b with Docker Model Runner:
docker model run hf.co/dphn/dolphin-2_2-yi-34b
Dataset question regarding eos
Hi,
first thank you for everything!
I have a question since I'm experimenting and debugging a lot.
Do you also see double eos tokens being added to the data when training ?
You can check using:
python -m axolotl.cli.train your_config.yml --prepare_ds_only --debug --debug_text_only --debug_num_examples 2
For a simple dataset, I have this (eos \n eos again):
<|im_start|>user
Bonjour!<|im_end|>
<|im_start|>assistant
Bonjour!<|im_end|>
<|im_end|><|im_start|>user
Salutations!<|im_end|>
<|im_start|>assistant
Salutations!<|im_end|>
<|im_end|>
The dataset is like this:
{"conversations"[{"from": "human", "value": "Bonjour!"},{"from": "gpt","value": "Bonjour!"}]}
{"conversations"[{"from": "human", "value": "Salutations!"},{"from": "gpt","value": "Salutations!"}]}
I assume since these are 2 eos tokens separated by \n, it doesn't matter for performance in the end, but I just wanted to run this by you.
this is really interesting. I will examine this.
I've spent half day trying and failing to figure out, why the moment I loaded this model, my obbabooga web ui broke completely, couldn't follow ChatML format anymore.
The main problem, wasn't just this this model, every other ChatML model stopped functioning no matter what I tried (short of complete webui purge and reinstall), but it was working flawlessly with every ChatML model all day until I attempted loading this one.
Still think it's largely due to some bugs in web ui, it usually is ridden with those, but maybe some issues with tokenizer issues had some play too.
I'm re-downloading bloke's quants once more to test.
the 2 eos tokens that appear here should not bother Ooba. Test the model in another way if you can.
The 2 eos tokens are added:
- First as a separator: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py#L163
ret += role + "\n" + message + self.sep + "\n" - axolotl adds another eos:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/1a6309c8a633a6fe17b2ffebbbc0353565f376e5/src/axolotl/prompt_tokenizers.py#L392
;;this should be the assistant response, should end with an eos token
.....
res = self._tokenize(
turn,
add_eos_token=True,
strip_bos_token=True,
)
And as such we have at the end of each conversation:
<|im_end|>\n<|im_end|>
is there something I need to change?
when I train dolphin 3.0 I can modify the axolotl config to train it differently
Hi Eric,
There is not much we can change at this point. I mean, we can submit a PR to axolotl and change the above self._tokenize(...) to set add_eos_token to False. (Will this affect other templates that use the ShareGPTPromptTokenizingStrategy class ? maybe, that is why it needs a bit of attention. I'll try but time is limited on my side, that is why no PR as of yet from me)
We also need to create a new FastChat conversation class to remove the \n at the end, like this:
register_conv_template(
Conversation(
name="chatml2",
system_template="<|im_start|>system\n{system_message}",
system_message="You are a helpful assistant.",
roles=["<|im_start|>user", "<|im_start|>assistant"],
sep_style=SeparatorStyle.CHATML,
sep="<|im_end|>"
)
)
// The original chatml conversation has sep="<|im_end|>\n" and as such FastChat and Axolotl both added a \n at the end.
Once this is done, the template will be correct. All this happens because multi turn conversation was not used that much previously and I guess people did not notice.
This may or may not affect the current trained Dolphin model. My not yet so expert opinion is that it affects it a little bit.