Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: deepseek-ai/deepseek-coder-6.7b-instruct
hub_model_id: darwinkernelpanic/deepseek-coder-6.7b-instruct-luau
hub_strategy: end
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true

datasets:
  - path: darwinkernelpanic/luau_corpus_axolotl
    type: completion
    field_instruction: prompt
    field_output: completion

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/deepseek-luau-finetune

sequence_len: 3072
sample_packing: true
eval_sample_packing: true

adapter: qlora
lora_model_dir:
lora_r: 32
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

wandb_project: deepseek-luau-finetune
wandb_entity:
wandb_watch:
wandb_name: deepseek-coder-6.7b-luau
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 6
num_epochs: 3
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002
bf16: auto
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

resume_from_checkpoint:
logging_steps: 10
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.01

fsdp: []
fsdp_config: {}

special_tokens:
  pad_token: "<|EOT|>"

deepseek-coder-6.7b-instruct-luau

This model is a fine-tuned version of deepseek-ai/deepseek-coder-6.7b-instruct on the darwinkernelpanic/luau_corpus_axolotl dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6346
  • Ppl: 5.1272
  • Memory/max Active (gib): 10.65
  • Memory/max Allocated (gib): 10.65
  • Memory/device Reserved (gib): 11.93

Model description

The model was fine-tuned on the Roblox/luau_corpus dataset which was converted to have the "prompt" collum replaced by "text" for compatibility reasons. It was fine-tuned for improved knowledge and performance on Luau code (Roblox's Lua dialect, see luau.org), which should end up improving code quality for Luau and Roblox projects.

Intended uses & limitations

This model is intended for use within applications that use the Luau programming language, including but not limited to

  • Roblox projects
  • Standalone Luau projects (Lune?)

It may have limitations for projects that

  • Use alternative languages
  • Use Lua
  • Non programming related projects

Training and evaluation data

N/A

Training procedure

Trained on 1x RTX 6000Ada

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 6
  • eval_batch_size: 6
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 12
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 16
  • training_steps: 162

Training results

Training Loss Epoch Step Validation Loss Ppl Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 3.8515 47.0637 7.0 7.0 7.26
3.2644 0.2593 14 2.8645 17.5407 10.65 10.65 12.22
2.6242 0.5185 28 2.2633 9.6147 12.27 12.27 14.58
2.0431 0.7778 42 2.0479 7.7515 10.65 10.65 13.92
1.9054 1.0370 56 1.9163 6.796 10.65 10.65 14.72
1.7318 1.2963 70 1.8184 6.1622 7.61 7.61 13.92
1.6119 1.5556 84 1.7550 5.7836 12.27 12.27 14.54
1.6022 1.8148 98 1.7048 5.5006 10.65 10.65 14.23
1.6249 2.0741 112 1.6723 5.3242 10.65 10.65 13.99
1.4995 2.3333 126 1.6503 5.2088 10.65 10.65 11.93
1.4803 2.5926 140 1.6381 5.1452 7.61 7.61 14.58
1.4872 2.8519 154 1.6346 5.1272 10.65 10.65 11.93

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darwinkernelpanic/deepseek-coder-6.7b-instruct-luau

Adapter
(378)
this model
Quantizations
1 model

Dataset used to train darwinkernelpanic/deepseek-coder-6.7b-instruct-luau

Evaluation results