Instructions to use dphn/dolphin-2_2-yi-34b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dphn/dolphin-2_2-yi-34b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dphn/dolphin-2_2-yi-34b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dphn/dolphin-2_2-yi-34b")
model = AutoModelForCausalLM.from_pretrained("dphn/dolphin-2_2-yi-34b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dphn/dolphin-2_2-yi-34b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dphn/dolphin-2_2-yi-34b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphin-2_2-yi-34b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dphn/dolphin-2_2-yi-34b

SGLang

How to use dphn/dolphin-2_2-yi-34b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dphn/dolphin-2_2-yi-34b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphin-2_2-yi-34b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dphn/dolphin-2_2-yi-34b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphin-2_2-yi-34b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dphn/dolphin-2_2-yi-34b with Docker Model Runner:
```
docker model run hf.co/dphn/dolphin-2_2-yi-34b
```

Dataset question regarding eos

by chrisgru - opened Nov 14, 2023

Discussion

chrisgru

Nov 14, 2023

•

edited Nov 14, 2023

Hi,
first thank you for everything!
I have a question since I'm experimenting and debugging a lot.
Do you also see double eos tokens being added to the data when training ?
You can check using:
python -m axolotl.cli.train your_config.yml --prepare_ds_only --debug --debug_text_only --debug_num_examples 2

For a simple dataset, I have this (eos \n eos again):
<|im_start|>user
Bonjour!<|im_end|>

<|im_start|>assistant
Bonjour!<|im_end|>

<|im_end|><|im_start|>user
Salutations!<|im_end|>

<|im_start|>assistant
Salutations!<|im_end|>

<|im_end|>

The dataset is like this:
{"conversations"[{"from": "human", "value": "Bonjour!"},{"from": "gpt","value": "Bonjour!"}]}
{"conversations"[{"from": "human", "value": "Salutations!"},{"from": "gpt","value": "Salutations!"}]}

I assume since these are 2 eos tokens separated by \n, it doesn't matter for performance in the end, but I just wanted to run this by you.

ehartford

Dolphin org Nov 15, 2023

this is really interesting. I will examine this.

chrisgru changed discussion status to closed Nov 15, 2023

chrisgru changed discussion status to open Nov 15, 2023

Tom9000

Nov 15, 2023

•

edited Nov 15, 2023

I've spent half day trying and failing to figure out, why the moment I loaded this model, my obbabooga web ui broke completely, couldn't follow ChatML format anymore.
The main problem, wasn't just this this model, every other ChatML model stopped functioning no matter what I tried (short of complete webui purge and reinstall), but it was working flawlessly with every ChatML model all day until I attempted loading this one.
Still think it's largely due to some bugs in web ui, it usually is ridden with those, but maybe some issues with tokenizer issues had some play too.
I'm re-downloading bloke's quants once more to test.

chrisgru

Nov 16, 2023

•

edited Nov 16, 2023

the 2 eos tokens that appear here should not bother Ooba. Test the model in another way if you can.
The 2 eos tokens are added:

First as a separator: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py#L163
ret += role + "\n" + message + self.sep + "\n"
axolotl adds another eos:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/1a6309c8a633a6fe17b2ffebbbc0353565f376e5/src/axolotl/prompt_tokenizers.py#L392

;;this should be the assistant response, should end with an eos token
.....
res = self._tokenize(
turn,
add_eos_token=True,
strip_bos_token=True,
)

And as such we have at the end of each conversation:
<|im_end|>\n<|im_end|>

ehartford

Dolphin org Nov 18, 2023

is there something I need to change?

ehartford

Dolphin org Nov 18, 2023

when I train dolphin 3.0 I can modify the axolotl config to train it differently

chrisgru

Nov 18, 2023

•

edited Nov 18, 2023

Hi Eric,

There is not much we can change at this point. I mean, we can submit a PR to axolotl and change the above self._tokenize(...) to set add_eos_token to False. (Will this affect other templates that use the ShareGPTPromptTokenizingStrategy class ? maybe, that is why it needs a bit of attention. I'll try but time is limited on my side, that is why no PR as of yet from me)
We also need to create a new FastChat conversation class to remove the \n at the end, like this:

register_conv_template(
Conversation(
name="chatml2",
system_template="<|im_start|>system\n{system_message}",
system_message="You are a helpful assistant.",
roles=["<|im_start|>user", "<|im_start|>assistant"],
sep_style=SeparatorStyle.CHATML,
sep="<|im_end|>"
)
)

// The original chatml conversation has sep="<|im_end|>\n" and as such FastChat and Axolotl both added a \n at the end.

Once this is done, the template will be correct. All this happens because multi turn conversation was not used that much previously and I guess people did not notice.
This may or may not affect the current trained Dolphin model. My not yet so expert opinion is that it affects it a little bit.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment