Phi4-5.6B-transformers-ex1
This model is a fine-tuned version of microsoft/Phi-4-multimodal-instruct on an unknown dataset.
It achieves the following results on the evaluation set:
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.95) and epsilon=1e-07 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- num_epochs: 10
Training results
| Training Loss |
Epoch |
Step |
Validation Loss |
| 0.1653 |
0.0799 |
20 |
0.1542 |
| 0.1324 |
0.1598 |
40 |
0.1429 |
| 0.2598 |
0.2398 |
60 |
0.3326 |
| 0.1638 |
0.3197 |
80 |
0.1500 |
| 0.1499 |
0.3996 |
100 |
0.4031 |
| 0.15 |
0.4795 |
120 |
0.3213 |
| 0.1679 |
0.5594 |
140 |
0.1489 |
| 0.1431 |
0.6394 |
160 |
0.1531 |
| 0.1462 |
0.7193 |
180 |
0.1488 |
| 0.1464 |
0.7992 |
200 |
0.1485 |
| 0.1379 |
0.8791 |
220 |
0.1482 |
| 0.1414 |
0.9590 |
240 |
0.1567 |
| 0.1328 |
1.0360 |
260 |
0.1472 |
| 0.134 |
1.1159 |
280 |
0.1466 |
| 0.1415 |
1.1958 |
300 |
0.1447 |
| 0.141 |
1.2757 |
320 |
0.1470 |
| 0.1378 |
1.3556 |
340 |
0.1685 |
| 0.1425 |
1.4356 |
360 |
0.1560 |
| 0.1405 |
1.5155 |
380 |
0.1412 |
| 0.135 |
1.5954 |
400 |
0.1512 |
| 0.1359 |
1.6753 |
420 |
0.1410 |
| 0.1336 |
1.7552 |
440 |
0.1394 |
| 0.1317 |
1.8352 |
460 |
0.1408 |
| 0.1323 |
1.9151 |
480 |
0.1497 |
| 0.1349 |
1.9950 |
500 |
0.1387 |
| 0.1204 |
2.0719 |
520 |
0.1407 |
| 0.1286 |
2.1518 |
540 |
0.1399 |
| 0.1333 |
2.2318 |
560 |
0.1414 |
| 0.1315 |
2.3117 |
580 |
0.1398 |
| 0.1313 |
2.3916 |
600 |
0.1455 |
| 0.1308 |
2.4715 |
620 |
0.1377 |
| 0.1327 |
2.5514 |
640 |
0.1400 |
| 0.1324 |
2.6314 |
660 |
0.1370 |
| 0.1309 |
2.7113 |
680 |
0.1343 |
| 0.1274 |
2.7912 |
700 |
0.1384 |
| 0.1287 |
2.8711 |
720 |
0.1353 |
| 0.1285 |
2.9510 |
740 |
0.1341 |
| 0.1256 |
3.0280 |
760 |
0.1380 |
| 0.1256 |
3.1079 |
780 |
0.1340 |
| 0.1224 |
3.1878 |
800 |
0.1372 |
| 0.1244 |
3.2677 |
820 |
0.1358 |
| 0.1256 |
3.3477 |
840 |
0.1337 |
| 0.1229 |
3.4276 |
860 |
0.1336 |
| 0.1252 |
3.5075 |
880 |
0.1333 |
| 0.1234 |
3.5874 |
900 |
0.1360 |
| 0.1276 |
3.6673 |
920 |
0.1344 |
| 0.1258 |
3.7473 |
940 |
0.1327 |
| 0.1249 |
3.8272 |
960 |
0.1357 |
| 0.1273 |
3.9071 |
980 |
0.1346 |
| 0.1266 |
3.9870 |
1000 |
0.1356 |
| 0.1172 |
4.0639 |
1020 |
0.1413 |
| 0.1236 |
4.1439 |
1040 |
0.1396 |
| 0.1219 |
4.2238 |
1060 |
0.1368 |
| 0.1187 |
4.3037 |
1080 |
0.1399 |
| 0.1225 |
4.3836 |
1100 |
0.1387 |
| 0.1243 |
4.4635 |
1120 |
0.1370 |
| 0.1218 |
4.5435 |
1140 |
0.1360 |
| 0.1189 |
4.6234 |
1160 |
0.1325 |
| 0.1185 |
4.7033 |
1180 |
0.1373 |
| 0.1251 |
4.7832 |
1200 |
0.1352 |
| 0.1214 |
4.8631 |
1220 |
0.1333 |
| 0.1225 |
4.9431 |
1240 |
0.1339 |
| 0.1138 |
5.0200 |
1260 |
0.1348 |
| 0.1205 |
5.0999 |
1280 |
0.1415 |
| 0.1208 |
5.1798 |
1300 |
0.1434 |
| 0.1165 |
5.2597 |
1320 |
0.1415 |
| 0.1154 |
5.3397 |
1340 |
0.1392 |
| 0.1143 |
5.4196 |
1360 |
0.1442 |
| 0.1165 |
5.4995 |
1380 |
0.1397 |
| 0.1162 |
5.5794 |
1400 |
0.1414 |
| 0.1148 |
5.6593 |
1420 |
0.1389 |
| 0.1133 |
5.7393 |
1440 |
0.1391 |
| 0.1145 |
5.8192 |
1460 |
0.1393 |
| 0.1152 |
5.8991 |
1480 |
0.1397 |
| 0.113 |
5.9790 |
1500 |
0.1407 |
| 0.0993 |
6.0559 |
1520 |
0.1625 |
| 0.0962 |
6.1359 |
1540 |
0.1609 |
| 0.0995 |
6.2158 |
1560 |
0.1573 |
| 0.1028 |
6.2957 |
1580 |
0.1582 |
| 0.0983 |
6.3756 |
1600 |
0.1620 |
| 0.0989 |
6.4555 |
1620 |
0.1572 |
| 0.0987 |
6.5355 |
1640 |
0.1602 |
| 0.0992 |
6.6154 |
1660 |
0.1593 |
| 0.0997 |
6.6953 |
1680 |
0.1644 |
| 0.0967 |
6.7752 |
1700 |
0.1630 |
| 0.0988 |
6.8551 |
1720 |
0.1596 |
| 0.098 |
6.9351 |
1740 |
0.1605 |
| 0.0915 |
7.0120 |
1760 |
0.1662 |
| 0.0666 |
7.0919 |
1780 |
0.2258 |
| 0.0638 |
7.1718 |
1800 |
0.2135 |
| 0.0581 |
7.2517 |
1820 |
0.2290 |
| 0.065 |
7.3317 |
1840 |
0.2115 |
| 0.0611 |
7.4116 |
1860 |
0.2396 |
| 0.059 |
7.4915 |
1880 |
0.2205 |
| 0.0598 |
7.5714 |
1900 |
0.2314 |
| 0.0608 |
7.6513 |
1920 |
0.2309 |
| 0.063 |
7.7313 |
1940 |
0.2383 |
| 0.0621 |
7.8112 |
1960 |
0.2304 |
| 0.0586 |
7.8911 |
1980 |
0.2433 |
| 0.0622 |
7.9710 |
2000 |
0.2354 |
| 0.0369 |
8.0480 |
2020 |
0.3233 |
| 0.0246 |
8.1279 |
2040 |
0.3437 |
| 0.022 |
8.2078 |
2060 |
0.3361 |
| 0.0243 |
8.2877 |
2080 |
0.3413 |
| 0.0235 |
8.3676 |
2100 |
0.3458 |
| 0.0229 |
8.4476 |
2120 |
0.3473 |
| 0.0218 |
8.5275 |
2140 |
0.3523 |
| 0.0234 |
8.6074 |
2160 |
0.3610 |
| 0.0228 |
8.6873 |
2180 |
0.3496 |
| 0.0221 |
8.7672 |
2200 |
0.3519 |
| 0.0223 |
8.8472 |
2220 |
0.3515 |
| 0.0224 |
8.9271 |
2240 |
0.3514 |
| 0.0193 |
9.0040 |
2260 |
0.3542 |
| 0.0081 |
9.0839 |
2280 |
0.4155 |
| 0.0071 |
9.1638 |
2300 |
0.4363 |
| 0.0065 |
9.2438 |
2320 |
0.4446 |
| 0.0057 |
9.3237 |
2340 |
0.4485 |
| 0.0064 |
9.4036 |
2360 |
0.4495 |
| 0.0071 |
9.4835 |
2380 |
0.4502 |
| 0.0058 |
9.5634 |
2400 |
0.4518 |
| 0.0066 |
9.6434 |
2420 |
0.4530 |
| 0.0072 |
9.7233 |
2440 |
0.4535 |
| 0.0064 |
9.8032 |
2460 |
0.4532 |
| 0.0076 |
9.8831 |
2480 |
0.4533 |
| 0.0063 |
9.9630 |
2500 |
0.4529 |
Framework versions
- Transformers 4.48.2
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1