File size: 23,827 Bytes
0a1d72c 30ecae2 0a1d72c 30ecae2 0a1d72c e150fd5 30ecae2 0a1d72c 3aed831 0a1d72c 9b59638 0a1d72c 30ecae2 0a1d72c 30ecae2 1ed7990 0a1d72c 679d842 0a1d72c 2c2ed15 0a1d72c 679d842 0a1d72c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
---
base_model:
- NX-AI/xLSTM-7b
library_name: transformers
model_name: xlstm-7b-chatml
tags:
- base_model:adapter:ethicalabs/xLSTM-7b-Instruct-PEFT
- sft
- transformers
- trl
licence: license
pipeline_tag: text-generation
license: other
datasets:
- HuggingFaceH4/ultrachat_200k
---
# Model Card for xLSTM-7b-Instruct
This model is an experimental fine-tuned version of [NX-AI/xLSTM-7b](https://huggingface.co/NX-AI/xLSTM-7b).
The goal is to add support for chat templates (from [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)) as discussed here: https://huggingface.co/NX-AI/xLSTM-7b/discussions/11
It has been trained using [TRL](https://github.com/huggingface/trl).

## Quick start
For text generation you have to to pin specific pytorch versions [https://huggingface.co/datasets/John6666/forum1/blob/main/xlstm_1.md](https://huggingface.co/datasets/John6666/forum1/blob/main/xlstm_1.md)
```shell
pip install "torch==2.5.1" "torchvision==0.20.1" "torchaudio==2.5.1" --index-url https://download.pytorch.org/whl/cu124
pip install "triton==3.4.0" # >=3.1 is OK; 3.4.0 current as of Sep 2025
pip install "mlstm-kernels==2.0.1" "xlstm==2.0.5"
```
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
MERGED_MODEL_PATH = "ethicalabs/xLSTM-7b-Instruct"
# We apply a configuration that uses native, hardware-agnostic kernels.
print("Defining a safe, native kernel configuration for compatibility...")
safe_config = AutoConfig.from_pretrained(MERGED_MODEL_PATH, trust_remote_code=True)
# Use the stable, native parallel kernel
safe_config.chunkwise_kernel = "chunkwise--native_autograd"
safe_config.sequence_kernel = "native_sequence__native"
safe_config.step_kernel = "native"
# This flag is still required for the HF implementation to avoid unpacking errors
safe_config.return_last_states = False
# Load the final, merged model with the safe config (no quantization)
print("Loading the final, merged model in bfloat16 (no quantization for compatibility)...")
final_model = AutoModelForCausalLM.from_pretrained(
MERGED_MODEL_PATH,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
config=safe_config
)
final_tokenizer = AutoTokenizer.from_pretrained(MERGED_MODEL_PATH)
# The tokenizer needs to know which token to use for padding.
if final_tokenizer.pad_token is None:
final_tokenizer.pad_token = final_tokenizer.eos_token
print("Padding token has been set.")
# Set the model to evaluation mode
final_model.eval()
# Run Inference
print("Preparing prompt and running generation...")
messages = [{"role": "user", "content": "Please suggest me some mindfullness relaxation techniques to overcome the frustration I have when I deal with Triton kernels."}]
prompt_string = final_tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
inputs = final_tokenizer(
prompt_string,
return_tensors="pt"
).to(final_model.device)
with torch.no_grad():
outputs = final_model.generate(**inputs, max_new_tokens=512)
output_ids = outputs[0][len(inputs.input_ids[0]) :]
response_text = final_tokenizer.decode(output_ids, skip_special_tokens=True)
print("\n--- Generated Response ---")
print(response_text)
```
Example text generation:
```
--- Generated Response ---
1. Mindful breathing: Take a few deep breaths and focus on your breath. Feel the air entering and leaving your body. This can help you calm down and focus on the present moment.
2. Body scan: Lie down or sit comfortably and focus on each part of your body, starting from your toes and moving up to your head. Notice any tension or discomfort and try to release it.
3. Visualization: Close your eyes and imagine yourself in a peaceful place, such as a beach or forest. Focus on the sights, sounds, and smells around you. This can help you relax and reduce stress.
4. Progressive muscle relaxation: Tense and then relax each muscle group in your body, starting from your toes and moving up to your head. This can help you release tension and relax your body.
5. Mantra repetition: Repeat a calming phrase or mantra to yourself, such as "I am calm" or "I am relaxed." This can help you focus your mind and reduce stress.
6. Walking meditation: Take a slow, mindful walk and focus on each step you take. Notice the feeling of your feet touching the ground and the movement of your legs. This can help you reduce stress and increase mindfulness.
7. Yoga: Practice yoga poses that focus on relaxation and mindfulness, such as child's pose, downward facing dog, and savasana. This can help you reduce stress and increase mindfulness.
8. Guided meditation: Listen to a guided meditation or relaxation exercise. This can help you focus your mind and reduce stress.
9. Mindful eating: Take a few deep breaths and focus on the taste, texture, and smell of your food. This can help you reduce stress and increase mindfulness.
10. Nature walk: Take a slow, mindful walk in nature and focus on the sights, sounds, and smells around you. This can help you reduce stress and increase mindfulness.
Remember to practice these techniques regularly to reduce stress and increase mindfulness.
You can also try using apps such as Headspace, Calm, or Insight Timer to help you practice mindfulness and relaxation techniques. These apps offer guided meditations, breathing exercises, and other mindfulness practices.
Remember to be patient and consistent with your practice, and don't be afraid to try different techniques to find what works best for you.
You can also try seeking support from a therapist or counselor if you are struggling with
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning/runs/pfmf34a3)
This model was trained with SFT.
## Evaluation
This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)
| Task |Version| Metric |Value | |Stderr|
|------------------------------------------------------|-------|----------------------------------------------------------------------------------------------------------------------------|-----:|---|-----:|
|all | |acc |0.4450|± |0.1503|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
| | |truthfulqa_mc1 |0.4000|± |0.1633|
| | |truthfulqa_mc2 |0.5256|± |0.1573|
| | |em:normalize_gold=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>&normalize_pred=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>|0.4000|± |0.1633|
|leaderboard:arc:challenge:25 | |acc |0.7000|± |0.1528|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
|leaderboard:gsm8k:5 | |em:normalize_gold=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>&normalize_pred=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>|0.4000|± |0.1633|
|leaderboard:hellaswag:10 | |acc |0.4000|± |0.1633|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
|leaderboard:mmlu:_average:5 | |acc |0.4386|± |0.1498|
|leaderboard:mmlu:abstract_algebra:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:anatomy:5 | |acc |0.2000|± |0.1333|
|leaderboard:mmlu:astronomy:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:business_ethics:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:clinical_knowledge:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:college_biology:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:college_chemistry:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:college_computer_science:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:college_mathematics:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:college_medicine:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:college_physics:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:computer_security:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:conceptual_physics:5 | |acc |0.2000|± |0.1333|
|leaderboard:mmlu:econometrics:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:electrical_engineering:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:elementary_mathematics:5 | |acc |0.1000|± |0.1000|
|leaderboard:mmlu:formal_logic:5 | |acc |0.2000|± |0.1333|
|leaderboard:mmlu:global_facts:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:high_school_biology:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:high_school_chemistry:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:high_school_computer_science:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:high_school_european_history:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:high_school_geography:5 | |acc |0.8000|± |0.1333|
|leaderboard:mmlu:high_school_government_and_politics:5| |acc |0.7000|± |0.1528|
|leaderboard:mmlu:high_school_macroeconomics:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:high_school_mathematics:5 | |acc |0.1000|± |0.1000|
|leaderboard:mmlu:high_school_microeconomics:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:high_school_physics:5 | |acc |0.2000|± |0.1333|
|leaderboard:mmlu:high_school_psychology:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:high_school_statistics:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:high_school_us_history:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:high_school_world_history:5 | |acc |0.9000|± |0.1000|
|leaderboard:mmlu:human_aging:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:human_sexuality:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:international_law:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:jurisprudence:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:logical_fallacies:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:machine_learning:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:management:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:marketing:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:medical_genetics:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:miscellaneous:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:moral_disputes:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:moral_scenarios:5 | |acc |0.0000|± |0.0000|
|leaderboard:mmlu:nutrition:5 | |acc |0.8000|± |0.1333|
|leaderboard:mmlu:philosophy:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:prehistory:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:professional_accounting:5 | |acc |0.1000|± |0.1000|
|leaderboard:mmlu:professional_law:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:professional_medicine:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:professional_psychology:5 | |acc |0.1000|± |0.1000|
|leaderboard:mmlu:public_relations:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:security_studies:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:sociology:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:us_foreign_policy:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:virology:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:world_religions:5 | |acc |0.7000|± |0.1528|
|leaderboard:truthfulqa:mc:0 | |truthfulqa_mc1 |0.4000|± |0.1633|
| | |truthfulqa_mc2 |0.5256|± |0.1573|
|leaderboard:winogrande:5 | |acc |0.6000|± |0.1633|
### Framework versions
- PEFT 0.17.1
- TRL: 0.23.1
- Transformers: 4.57.0
- Pytorch: 2.8.0+cu128
- Datasets: 4.2.0
- Tokenizers: 0.22.1
## Citations
```bibtext
@misc{beck2024xlstmextendedlongshortterm,
title={xLSTM: Extended Long Short-Term Memory},
author={Maximilian Beck and Korbinian Pöppel and Markus Spanring and Andreas Auer and Oleksandra Prudnikova and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter},
year={2024},
eprint={2405.04517},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2405.04517},
}
```
```
@misc{han2024parameterefficientfinetuninglargemodels,
title={Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey},
author={Zeyu Han and Chao Gao and Jinyang Liu and Jeff Zhang and Sai Qian Zhang},
year={2024},
eprint={2403.14608},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2403.14608},
}
```
```
@misc{liu2024doraweightdecomposedlowrankadaptation,
title={DoRA: Weight-Decomposed Low-Rank Adaptation},
author={Shih-Yang Liu and Chien-Yi Wang and Hongxu Yin and Pavlo Molchanov and Yu-Chiang Frank Wang and Kwang-Ting Cheng and Min-Hung Chen},
year={2024},
eprint={2402.09353},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2402.09353},
}
```
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
```
```bibtex
@misc{lighteval,
author = {Habib, Nathan and Fourrier, Clémentine and Kydlíček, Hynek and Wolf, Thomas and Tunstall, Lewis},
title = {LightEval: A lightweight framework for LLM evaluation},
year = {2023},
version = {0.11.0},
url = {https://github.com/huggingface/lighteval}
}
``` |