defog-orpo-model-v5-1epoch

Browse files

Files changed (4) hide show

README.md +19 -19
adapter_config.json +4 -4
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -14,23 +14,23 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sert121/huggingface/runs/hdobvgjx)
 # results
 This model is a fine-tuned version of [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1487
-- Rewards/chosen: -0.0059
-- Rewards/rejected: -0.0256
-- Rewards/accuracies: 0.9037
-- Rewards/margins: 0.0197
-- Logps/rejected: -0.2555
-- Logps/chosen: -0.0585
-- Logits/rejected: 0.2408
-- Logits/chosen: 0.2329
-- Nll Loss: 0.1244
-- Log Odds Ratio: -0.2414
-- Log Odds Chosen: 1.5632
 ## Model description
@@ -58,17 +58,17 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 2
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
-| 0.6997        | 0.4   | 144  | 0.6765          | -0.0517        | -0.0564          | 0.8447             | 0.0047          | -0.5638        | -0.5169      | -0.1910         | -0.1941       | 0.6134   | -0.6250        | 0.1486          |
-| 0.206         | 0.8   | 288  | 0.1943          | -0.0081        | -0.0186          | 0.8975             | 0.0105          | -0.1858        | -0.0809      | 0.0507          | 0.0486        | 0.1574   | -0.3672        | 0.9122          |
-| 0.1531        | 1.2   | 432  | 0.1592          | -0.0064        | -0.0245          | 0.9068             | 0.0182          | -0.2452        | -0.0637      | 0.2239          | 0.2196        | 0.1331   | -0.2599        | 1.4386          |
-| 0.1424        | 1.6   | 576  | 0.1510          | -0.0060        | -0.0257          | 0.8975             | 0.0197          | -0.2569        | -0.0597      | 0.2172          | 0.2093        | 0.1265   | -0.2436        | 1.5494          |
-| 0.1291        | 2.0   | 720  | 0.1487          | -0.0059        | -0.0256          | 0.9037             | 0.0197          | -0.2555        | -0.0585      | 0.2408          | 0.2329        | 0.1244   | -0.2414        | 1.5632          |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sert121/huggingface/runs/ubrsk8hu)
 # results
 This model is a fine-tuned version of [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.2568
+- Rewards/chosen: -0.0126
+- Rewards/rejected: -0.0217
+- Rewards/accuracies: 0.8944
+- Rewards/margins: 0.0091
+- Logps/rejected: -0.2167
+- Logps/chosen: -0.1258
+- Logits/rejected: 0.1307
+- Logits/chosen: 0.1283
+- Nll Loss: 0.2132
+- Log Odds Ratio: -0.4354
+- Log Odds Chosen: 0.6768
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 1
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 0.9481        | 0.2   | 72   | 0.9541          | -0.0776        | -0.0797          | 0.7143             | 0.0021          | -0.7975        | -0.7765      | -0.3031         | -0.3167       | 0.8875   | -0.6703        | 0.0480          |
+| 0.7313        | 0.4   | 144  | 0.7089          | -0.0551        | -0.0596          | 0.8292             | 0.0045          | -0.5962        | -0.5513      | -0.1005         | -0.1135       | 0.6459   | -0.6312        | 0.1330          |
+| 0.547         | 0.6   | 216  | 0.4407          | -0.0292        | -0.0367          | 0.8882             | 0.0075          | -0.3670        | -0.2924      | -0.0064         | -0.0109       | 0.3866   | -0.5408        | 0.3609          |
+| 0.2547        | 0.8   | 288  | 0.3018          | -0.0164        | -0.0250          | 0.8882             | 0.0085          | -0.2498        | -0.1644      | 0.0633          | 0.0592        | 0.2551   | -0.4664        | 0.5805          |
+| 0.3407        | 1.0   | 360  | 0.2568          | -0.0126        | -0.0217          | 0.8944             | 0.0091          | -0.2167        | -0.1258      | 0.1307          | 0.1283        | 0.2132   | -0.4354        | 0.6768          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "down_proj",
     "q_proj",
     "v_proj",
-    "k_proj",
-    "up_proj",
     "gate_proj",
-    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "k_proj",
     "q_proj",
+    "down_proj",
     "v_proj",
     "gate_proj",
+    "o_proj",
+    "up_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:185b7846ec1099ee2e4c06a87708f963f7f99a0d5f886c9caf7a01c23e27212c
 size 4370592096

 version https://git-lfs.github.com/spec/v1
+oid sha256:8bf2d06acf02dac3fec00d31e0d2324d28dc5dec563a8feaf13a1699db84d9cb
 size 4370592096

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7acb66cd4720a5d06a43a6841aa6397ab9f76e3d4090ec18cdd61e682cfccdf7
 size 5432

 version https://git-lfs.github.com/spec/v1
+oid sha256:bf5df8b0118c094a6e475d4cc0489b80b06a79e6167bd5cb3cac1d509d7f970e
 size 5432