https://huggingface.co/Josephgflowers/FinR1-llama-8b-multi-language-thinking

#1476

by Josephgflowers - opened Oct 25, 2025

Discussion

Josephgflowers

Oct 25, 2025

https://huggingface.co/Josephgflowers/FinR1-llama-8b-multi-language-thinking

Please and thank you.

nicoboss

Oct 25, 2025

It's queued! :D

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#FinR1-llama-8b-multi-language-thinking-GGUF for quants to appear.

nicoboss

Oct 25, 2025

While the model successfully converted to a GGUF the resulting GGUF unfortunately failed to load in llama.cpp. I will soon provide some more information why it failed to load. The worker that processed your model is currently unreachable but I will check the logs once it is available again.

nicoboss

Oct 26, 2025

So the error is that the model misses a tensor named token_embd.weight which llama.cpp expects for the LlamaForCausalLM architecture the way it was implemented in llama.cpp. LlamaForCausalLM is often used for custom not llama.cpp supported architectures which might as well be the case here too.

print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
llama_model_load: error loading model: missing tensor 'token_embd.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'FinR1-llama-8b-multi-language-thinking.gguf~', try reducing --n-gpu-layers if you're running out of VRAM
main: error: unable to load model

Josephgflowers

Oct 26, 2025

Thank you for the details. I will look into this and see if I can convert it to usable format for llama.cpp or if I can patch a local llama.cpp version to fix this.
Again, thank you for trying and thank you for the details.

Josephgflowers

Oct 26, 2025

I fixed the issue with the model. It had mixed bf16 and fp16 in the weights. I converted it all to fp16.
Any chance I can get a re try? Will it work with the same name or do I need to re name or re upload it first?

At least I didn't have to write a custom convert script (flashbacks from phi4).

Josephgflowers

Oct 26, 2025

Never mind that made it a little wonky. I think it needs some healing.

mradermacher

Owner Oct 27, 2025

hmm, interesting. mixed bf16/f16 shouldn't cause an issue, but mismatching model architecture is - llama.cpp couldn't identify which tensor is to be used for token_embd.weight

anyway, sure, we can retry each time you fix something. as long as the model doesn't work, i don't think a new repo is useful. a new repo is mostly useful if a model is already widely used and working, and a new version is then published.

Josephgflowers

Oct 30, 2025

I re uploaded the model after the conversion and a little more training. It is working and stable now. It should have no issues with llama.cpp now.
Can you please re try?

nicoboss

Oct 30, 2025

•

edited Oct 30, 2025

Can you please re try?

Sure. It's queued! :D

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#FinR1-llama-8b-multi-language-thinking-GGUF for quants to appear.

It is working and stable now. It should have no issues with llama.cpp now.

Let's hope for the best. I will let you know how it goes.

Josephgflowers

Oct 30, 2025

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment