Model Details

Model Description

Bonsai is a small 500 million parameter ternary weight language model trained by deepgrove. Bonsai adopts the Llama architecture and Mistral tokenizer following Danube 3, with modified linear layers to support ternary weights. The model has been trained primarily using DCLM-Pro and Fineweb-Edu. Bonsai marks a new paradigm of efficiency, being trained in less than 5 billion tokens.

Developed by: deepgrove
Language(s) (NLP): English
License: Apache-2
Repository: https://github.com/deepgrove-ai/Bonsai
Paper: https://github.com/deepgrove-ai/Bonsai/tree/main/paper/Bonsai.pdf

Usage

Bonsai can be easily used through the Huggingface Transformers library. However, we note that all operations are currently performed in 16 bit precision; we're currently working towards integrating our model design with custom mixed precision kernels. A quick example follows:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

We note that Bonsai is not instruction tuned; we highly recommend finetuning the model before usage in a downstream task.

Evaluation

Bonsai achieves competitive performance among its peers, being one of the first ternary models to do so. Evalution results are below; for more detailed results and comparisons to other ternary models, please see the accompanying paper linked above. We use lm-eval for all benchmarks outside of MMLU and lighteval's cloze formulation for MMLU.

Model	ARC-c	ARC-e	HS.	OBQA	PiQA	Wino.	MMLU	Avg
MobiLlama 0.5B	26.62	46.68	51.66	30.00	71.65	54.50	28.61	44.25
Qwen 2 0.5B	28.84	50.29	49.12	33.00	69.26	56.99	31.78	45.61
MobileLLM 600M	29.01	56.65	55.35	34.00	71.65	59.75	31.40	48.13
Qwen 2.5 0.5B	32.25	58.29	52.18	35.40	69.91	56.12	33.40	48.22
Bonsai	33.36	57.95	48.04	34.00	70.24	54.85	30.28	46.96

Downloads last month: 161

Safetensors

Model size

0.5B params

Tensor type

BF16

Space using deepgrove/Bonsai 1

Paper for deepgrove/Bonsai

H2O-Danube3 Technical Report

Paper • 2407.09276 • Published Jul 12, 2024 • 20