deepnight-research/lil-c3po

Model Details:

lil-c3po is an open-source large language model (LLM) resulting from the linear merge of two distinct fine-tuned Mistral-7B models, internally referred to as c3-1 and c3-2. These models, developed in-house, bring together unique characteristics to enhance performance and utility.

Model Architecture:

lil-c3po inherits its architecture from the combined c3-1 and c3-2 models, incorporating features such as Grouped-Query Attention, Sliding-Window Attention, and Byte-fallback BPE tokenizer. This fusion aims to capitalize on the strengths of both models for improved language understanding and generation.

Training Details:

The first model, internally referred to as c3-1, is a 7B parameter Large Language Model fine-tuned on the Intel Gaudi 2 processor. It utilizes the Direct Performance Optimization (DPO) method and is designed to excel in various language-related tasks.
The second model, denoted as c3-2, is an instruct fine-tuned version of Mistral-7B. Its architecture features improvements in instruct fine-tuning, contributing to enhanced language understanding in instructional contexts.

License:

lil-c3po is released under the MIT license, fostering open-source collaboration and innovation.

Intended Use:

This merged model is suitable for a broad range of language-related tasks, inheriting the capabilities of the fine-tuned c3-1 and c3-2 models. Users interested in language tasks can leverage lil-c3po's capabilities.

Out-of-Scope Uses:

While lil-c3po is versatile, it is important to note that, in most cases, fine-tuning may be necessary for specific tasks. Additionally, the model should not be used to intentionally create hostile or alienating environments for people.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	68.03
AI2 Reasoning Challenge (25-Shot)	65.02
HellaSwag (10-Shot)	84.45
MMLU (5-Shot)	62.36
TruthfulQA (0-shot)	68.73
Winogrande (5-shot)	79.16
GSM8k (5-shot)	48.45

Downloads last month: 58

Safetensors

Model size

7B params

Tensor type

F32

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

65.020
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

84.450
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

62.360
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

68.730
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

79.160
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

48.450