https://huggingface.co/Josephgflowers/FinR1-llama-8b-multi-language-thinking
It's queued! :D
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#FinR1-llama-8b-multi-language-thinking-GGUF for quants to appear.
While the model successfully converted to a GGUF the resulting GGUF unfortunately failed to load in llama.cpp. I will soon provide some more information why it failed to load. The worker that processed your model is currently unreachable but I will check the logs once it is available again.
So the error is that the model misses a tensor named token_embd.weight which llama.cpp expects for the LlamaForCausalLM architecture the way it was implemented in llama.cpp. LlamaForCausalLM is often used for custom not llama.cpp supported architectures which might as well be the case here too.
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
llama_model_load: error loading model: missing tensor 'token_embd.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'FinR1-llama-8b-multi-language-thinking.gguf~', try reducing --n-gpu-layers if you're running out of VRAM
main: error: unable to load model
Thank you for the details. I will look into this and see if I can convert it to usable format for llama.cpp or if I can patch a local llama.cpp version to fix this.
Again, thank you for trying and thank you for the details.
I fixed the issue with the model. It had mixed bf16 and fp16 in the weights. I converted it all to fp16.
Any chance I can get a re try? Will it work with the same name or do I need to re name or re upload it first?
At least I didn't have to write a custom convert script (flashbacks from phi4).
Never mind that made it a little wonky. I think it needs some healing.
hmm, interesting. mixed bf16/f16 shouldn't cause an issue, but mismatching model architecture is - llama.cpp couldn't identify which tensor is to be used for token_embd.weight
anyway, sure, we can retry each time you fix something. as long as the model doesn't work, i don't think a new repo is useful. a new repo is mostly useful if a model is already widely used and working, and a new version is then published.
I re uploaded the model after the conversion and a little more training. It is working and stable now. It should have no issues with llama.cpp now.
Can you please re try?
Can you please re try?
Sure. It's queued! :D
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#FinR1-llama-8b-multi-language-thinking-GGUF for quants to appear.
It is working and stable now. It should have no issues with llama.cpp now.
Let's hope for the best. I will let you know how it goes.
Thank you!