Instructions to use staerkjo/checkpoints-mistral with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use staerkjo/checkpoints-mistral with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("mistralai/Ministral-8B-Instruct-2410") model = PeftModel.from_pretrained(base_model, "staerkjo/checkpoints-mistral") - Transformers
How to use staerkjo/checkpoints-mistral with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="staerkjo/checkpoints-mistral") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("staerkjo/checkpoints-mistral", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use staerkjo/checkpoints-mistral with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "staerkjo/checkpoints-mistral" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "staerkjo/checkpoints-mistral", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/staerkjo/checkpoints-mistral
- SGLang
How to use staerkjo/checkpoints-mistral with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "staerkjo/checkpoints-mistral" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "staerkjo/checkpoints-mistral", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "staerkjo/checkpoints-mistral" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "staerkjo/checkpoints-mistral", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use staerkjo/checkpoints-mistral with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for staerkjo/checkpoints-mistral to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for staerkjo/checkpoints-mistral to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for staerkjo/checkpoints-mistral to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="staerkjo/checkpoints-mistral", max_seq_length=2048, ) - Docker Model Runner
How to use staerkjo/checkpoints-mistral with Docker Model Runner:
docker model run hf.co/staerkjo/checkpoints-mistral
| { | |
| "best_global_step": 2000, | |
| "best_metric": 0.2992326021194458, | |
| "best_model_checkpoint": "./outputs/checkpoints/checkpoint-2000", | |
| "epoch": 3.0, | |
| "eval_steps": 250, | |
| "global_step": 3117, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.04814636494944632, | |
| "grad_norm": 0.49752116203308105, | |
| "learning_rate": 6.32258064516129e-05, | |
| "loss": 1.477303009033203, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.09629272989889263, | |
| "grad_norm": 0.3652721643447876, | |
| "learning_rate": 0.00012774193548387096, | |
| "loss": 0.46554481506347656, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.14443909484833894, | |
| "grad_norm": 0.32378166913986206, | |
| "learning_rate": 0.00019225806451612904, | |
| "loss": 0.4035662078857422, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.19258545979778527, | |
| "grad_norm": 0.3265906870365143, | |
| "learning_rate": 0.0001998911255002373, | |
| "loss": 0.38129932403564454, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 0.2407318247472316, | |
| "grad_norm": 0.2637921869754791, | |
| "learning_rate": 0.00019950341273368075, | |
| "loss": 0.36489856719970704, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.2407318247472316, | |
| "eval_loss": 0.35290107131004333, | |
| "eval_runtime": 694.8697, | |
| "eval_samples_per_second": 2.658, | |
| "eval_steps_per_second": 0.665, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 0.2888781896966779, | |
| "grad_norm": 0.23706986010074615, | |
| "learning_rate": 0.00019883592694304428, | |
| "loss": 0.3474379348754883, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 0.33702455464612424, | |
| "grad_norm": 0.2362891435623169, | |
| "learning_rate": 0.00019789054489328546, | |
| "loss": 0.35004573822021484, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 0.38517091959557054, | |
| "grad_norm": 0.2134203463792801, | |
| "learning_rate": 0.00019666992470825862, | |
| "loss": 0.33882694244384765, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 0.43331728454501683, | |
| "grad_norm": 0.23494897782802582, | |
| "learning_rate": 0.0001951774983968872, | |
| "loss": 0.35397193908691404, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 0.4814636494944632, | |
| "grad_norm": 0.20449194312095642, | |
| "learning_rate": 0.00019341746220340924, | |
| "loss": 0.3447082901000977, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.4814636494944632, | |
| "eval_loss": 0.3319753110408783, | |
| "eval_runtime": 695.9468, | |
| "eval_samples_per_second": 2.654, | |
| "eval_steps_per_second": 0.664, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 0.5296100144439095, | |
| "grad_norm": 0.19552361965179443, | |
| "learning_rate": 0.00019139476480882785, | |
| "loss": 0.34256481170654296, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 0.5777563793933558, | |
| "grad_norm": 0.19304735958576202, | |
| "learning_rate": 0.00018911509341674071, | |
| "loss": 0.3311490631103516, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 0.6259027443428021, | |
| "grad_norm": 0.22191040217876434, | |
| "learning_rate": 0.00018658485776267095, | |
| "loss": 0.32927177429199217, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 0.6740491092922485, | |
| "grad_norm": 0.19663718342781067, | |
| "learning_rate": 0.00018381117209186033, | |
| "loss": 0.32792991638183594, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 0.7221954742416947, | |
| "grad_norm": 0.20722714066505432, | |
| "learning_rate": 0.00018080183515619726, | |
| "loss": 0.3278979873657227, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.7221954742416947, | |
| "eval_loss": 0.3201846480369568, | |
| "eval_runtime": 698.1861, | |
| "eval_samples_per_second": 2.645, | |
| "eval_steps_per_second": 0.662, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 0.7703418391911411, | |
| "grad_norm": 0.18782874941825867, | |
| "learning_rate": 0.000177565308286523, | |
| "loss": 0.32439987182617186, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 0.8184882041405874, | |
| "grad_norm": 0.19059380888938904, | |
| "learning_rate": 0.0001741106916019688, | |
| "loss": 0.32634170532226564, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 0.8666345690900337, | |
| "grad_norm": 0.19336679577827454, | |
| "learning_rate": 0.00017044769842321705, | |
| "loss": 0.3187075614929199, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 0.91478093403948, | |
| "grad_norm": 0.19474540650844574, | |
| "learning_rate": 0.00016658662796162794, | |
| "loss": 0.3124076271057129, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 0.9629272989889264, | |
| "grad_norm": 0.19545841217041016, | |
| "learning_rate": 0.0001625383363610215, | |
| "loss": 0.32053829193115235, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 0.9629272989889264, | |
| "eval_loss": 0.3141631782054901, | |
| "eval_runtime": 696.8814, | |
| "eval_samples_per_second": 2.65, | |
| "eval_steps_per_second": 0.663, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 1.010592200288878, | |
| "grad_norm": 0.23214930295944214, | |
| "learning_rate": 0.00015831420617353685, | |
| "loss": 0.3157562255859375, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 1.0587385652383245, | |
| "grad_norm": 0.19467434287071228, | |
| "learning_rate": 0.0001539261143553928, | |
| "loss": 0.29116039276123046, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 1.1068849301877708, | |
| "grad_norm": 0.2111353874206543, | |
| "learning_rate": 0.0001493863988725362, | |
| "loss": 0.29466302871704103, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 1.155031295137217, | |
| "grad_norm": 0.21094423532485962, | |
| "learning_rate": 0.0001447078240100725, | |
| "loss": 0.2904193878173828, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 1.2031776600866635, | |
| "grad_norm": 0.20865824818611145, | |
| "learning_rate": 0.00013990354448301798, | |
| "loss": 0.29510746002197263, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 1.2031776600866635, | |
| "eval_loss": 0.3110591769218445, | |
| "eval_runtime": 694.8335, | |
| "eval_samples_per_second": 2.658, | |
| "eval_steps_per_second": 0.665, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 1.2513240250361097, | |
| "grad_norm": 0.20231032371520996, | |
| "learning_rate": 0.00013498706844928282, | |
| "loss": 0.29798513412475586, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 1.2994703899855562, | |
| "grad_norm": 0.23587271571159363, | |
| "learning_rate": 0.00012997221952888146, | |
| "loss": 0.29837047576904296, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 1.3476167549350024, | |
| "grad_norm": 0.19505874812602997, | |
| "learning_rate": 0.0001248730979361606, | |
| "loss": 0.2953862953186035, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 1.3957631198844487, | |
| "grad_norm": 0.18162201344966888, | |
| "learning_rate": 0.00011970404083432846, | |
| "loss": 0.28933364868164063, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 1.443909484833895, | |
| "grad_norm": 0.18973521888256073, | |
| "learning_rate": 0.0001144795820237573, | |
| "loss": 0.29408544540405274, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 1.443909484833895, | |
| "eval_loss": 0.30712059140205383, | |
| "eval_runtime": 694.8078, | |
| "eval_samples_per_second": 2.658, | |
| "eval_steps_per_second": 0.665, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 1.4920558497833414, | |
| "grad_norm": 0.18828196823596954, | |
| "learning_rate": 0.00010921441107740198, | |
| "loss": 0.28817583084106446, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 1.5402022147327876, | |
| "grad_norm": 0.19784030318260193, | |
| "learning_rate": 0.0001039233320382344, | |
| "loss": 0.292324333190918, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 1.588348579682234, | |
| "grad_norm": 0.21540944278240204, | |
| "learning_rate": 9.862122179482317e-05, | |
| "loss": 0.2901228141784668, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 1.6364949446316803, | |
| "grad_norm": 0.22679032385349274, | |
| "learning_rate": 9.332298825209385e-05, | |
| "loss": 0.28837751388549804, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 1.6846413095811266, | |
| "grad_norm": 0.19287723302841187, | |
| "learning_rate": 8.80435284148808e-05, | |
| "loss": 0.28883354187011717, | |
| "step": 1750 | |
| }, | |
| { | |
| "epoch": 1.6846413095811266, | |
| "eval_loss": 0.30308136343955994, | |
| "eval_runtime": 695.3246, | |
| "eval_samples_per_second": 2.656, | |
| "eval_steps_per_second": 0.664, | |
| "step": 1750 | |
| }, | |
| { | |
| "epoch": 1.7327876745305728, | |
| "grad_norm": 0.18742942810058594, | |
| "learning_rate": 8.279768650212679e-05, | |
| "loss": 0.2922953224182129, | |
| "step": 1800 | |
| }, | |
| { | |
| "epoch": 1.7809340394800193, | |
| "grad_norm": 0.17063775658607483, | |
| "learning_rate": 7.760021220950004e-05, | |
| "loss": 0.29046398162841797, | |
| "step": 1850 | |
| }, | |
| { | |
| "epoch": 1.8290804044294657, | |
| "grad_norm": 0.1963094025850296, | |
| "learning_rate": 7.246571923778208e-05, | |
| "loss": 0.28941730499267576, | |
| "step": 1900 | |
| }, | |
| { | |
| "epoch": 1.877226769378912, | |
| "grad_norm": 0.186972513794899, | |
| "learning_rate": 6.740864420363105e-05, | |
| "loss": 0.28976251602172853, | |
| "step": 1950 | |
| }, | |
| { | |
| "epoch": 1.9253731343283582, | |
| "grad_norm": 0.18225796520709991, | |
| "learning_rate": 6.244320604825131e-05, | |
| "loss": 0.2877297782897949, | |
| "step": 2000 | |
| }, | |
| { | |
| "epoch": 1.9253731343283582, | |
| "eval_loss": 0.2992326021194458, | |
| "eval_runtime": 695.681, | |
| "eval_samples_per_second": 2.655, | |
| "eval_steps_per_second": 0.664, | |
| "step": 2000 | |
| }, | |
| { | |
| "epoch": 1.9735194992778045, | |
| "grad_norm": 0.18147684633731842, | |
| "learning_rate": 5.7583366058099916e-05, | |
| "loss": 0.2825794219970703, | |
| "step": 2050 | |
| }, | |
| { | |
| "epoch": 2.021184400577756, | |
| "grad_norm": 0.21326377987861633, | |
| "learning_rate": 5.284278861003815e-05, | |
| "loss": 0.271549129486084, | |
| "step": 2100 | |
| }, | |
| { | |
| "epoch": 2.069330765527203, | |
| "grad_norm": 0.2095552235841751, | |
| "learning_rate": 4.823480275130263e-05, | |
| "loss": 0.24787240982055664, | |
| "step": 2150 | |
| }, | |
| { | |
| "epoch": 2.117477130476649, | |
| "grad_norm": 0.18589918315410614, | |
| "learning_rate": 4.3772364722319715e-05, | |
| "loss": 0.2502709770202637, | |
| "step": 2200 | |
| }, | |
| { | |
| "epoch": 2.1656234954260953, | |
| "grad_norm": 0.2025139182806015, | |
| "learning_rate": 3.946802152773811e-05, | |
| "loss": 0.25036895751953125, | |
| "step": 2250 | |
| }, | |
| { | |
| "epoch": 2.1656234954260953, | |
| "eval_loss": 0.3054317533969879, | |
| "eval_runtime": 695.5773, | |
| "eval_samples_per_second": 2.655, | |
| "eval_steps_per_second": 0.664, | |
| "step": 2250 | |
| }, | |
| { | |
| "epoch": 2.2137698603755416, | |
| "grad_norm": 0.23237130045890808, | |
| "learning_rate": 3.533387565810706e-05, | |
| "loss": 0.24967620849609376, | |
| "step": 2300 | |
| }, | |
| { | |
| "epoch": 2.261916225324988, | |
| "grad_norm": 0.21668598055839539, | |
| "learning_rate": 3.1381551061391366e-05, | |
| "loss": 0.2472244071960449, | |
| "step": 2350 | |
| }, | |
| { | |
| "epoch": 2.310062590274434, | |
| "grad_norm": 0.21904581785202026, | |
| "learning_rate": 2.7622160460001166e-05, | |
| "loss": 0.24260114669799804, | |
| "step": 2400 | |
| }, | |
| { | |
| "epoch": 2.3582089552238807, | |
| "grad_norm": 0.2477143406867981, | |
| "learning_rate": 2.406627410523087e-05, | |
| "loss": 0.24558507919311523, | |
| "step": 2450 | |
| }, | |
| { | |
| "epoch": 2.406355320173327, | |
| "grad_norm": 0.26366570591926575, | |
| "learning_rate": 2.0723890056960227e-05, | |
| "loss": 0.24778993606567382, | |
| "step": 2500 | |
| }, | |
| { | |
| "epoch": 2.406355320173327, | |
| "eval_loss": 0.30330467224121094, | |
| "eval_runtime": 695.5219, | |
| "eval_samples_per_second": 2.656, | |
| "eval_steps_per_second": 0.664, | |
| "step": 2500 | |
| }, | |
| { | |
| "epoch": 2.4545016851227732, | |
| "grad_norm": 0.25673332810401917, | |
| "learning_rate": 1.7604406072182122e-05, | |
| "loss": 0.24701383590698242, | |
| "step": 2550 | |
| }, | |
| { | |
| "epoch": 2.5026480500722195, | |
| "grad_norm": 0.2344675064086914, | |
| "learning_rate": 1.4716593181396964e-05, | |
| "loss": 0.23919960021972655, | |
| "step": 2600 | |
| }, | |
| { | |
| "epoch": 2.5507944150216657, | |
| "grad_norm": 0.24205727875232697, | |
| "learning_rate": 1.2068571027170161e-05, | |
| "loss": 0.2499522399902344, | |
| "step": 2650 | |
| }, | |
| { | |
| "epoch": 2.5989407799711124, | |
| "grad_norm": 0.18950000405311584, | |
| "learning_rate": 9.667785034191811e-06, | |
| "loss": 0.24822561264038087, | |
| "step": 2700 | |
| }, | |
| { | |
| "epoch": 2.6470871449205586, | |
| "grad_norm": 0.23864181339740753, | |
| "learning_rate": 7.5209854750301844e-06, | |
| "loss": 0.24670083999633788, | |
| "step": 2750 | |
| }, | |
| { | |
| "epoch": 2.6470871449205586, | |
| "eval_loss": 0.30303388833999634, | |
| "eval_runtime": 696.3045, | |
| "eval_samples_per_second": 2.653, | |
| "eval_steps_per_second": 0.664, | |
| "step": 2750 | |
| }, | |
| { | |
| "epoch": 2.695233509870005, | |
| "grad_norm": 0.2515101730823517, | |
| "learning_rate": 5.634208490439252e-06, | |
| "loss": 0.24807355880737306, | |
| "step": 2800 | |
| }, | |
| { | |
| "epoch": 2.743379874819451, | |
| "grad_norm": 0.22556646168231964, | |
| "learning_rate": 4.012759117585463e-06, | |
| "loss": 0.24218505859375, | |
| "step": 2850 | |
| }, | |
| { | |
| "epoch": 2.7915262397688974, | |
| "grad_norm": 0.21387307345867157, | |
| "learning_rate": 2.661196373913288e-06, | |
| "loss": 0.24130987167358398, | |
| "step": 2900 | |
| }, | |
| { | |
| "epoch": 2.839672604718344, | |
| "grad_norm": 0.20219890773296356, | |
| "learning_rate": 1.5833204385888867e-06, | |
| "loss": 0.24288219451904297, | |
| "step": 2950 | |
| }, | |
| { | |
| "epoch": 2.88781896966779, | |
| "grad_norm": 0.24099305272102356, | |
| "learning_rate": 7.821619675638658e-07, | |
| "loss": 0.25014289855957034, | |
| "step": 3000 | |
| }, | |
| { | |
| "epoch": 2.88781896966779, | |
| "eval_loss": 0.3028092682361603, | |
| "eval_runtime": 696.6975, | |
| "eval_samples_per_second": 2.651, | |
| "eval_steps_per_second": 0.663, | |
| "step": 3000 | |
| }, | |
| { | |
| "epoch": 2.9359653346172365, | |
| "grad_norm": 0.2587520182132721, | |
| "learning_rate": 2.5997357230198583e-07, | |
| "loss": 0.24698244094848631, | |
| "step": 3050 | |
| }, | |
| { | |
| "epoch": 2.9841116995666828, | |
| "grad_norm": 0.19957926869392395, | |
| "learning_rate": 1.8223486127799673e-08, | |
| "loss": 0.24292854309082032, | |
| "step": 3100 | |
| } | |
| ], | |
| "logging_steps": 50, | |
| "max_steps": 3117, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 3, | |
| "save_steps": 250, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 2.0828994331856486e+18, | |
| "train_batch_size": 4, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |