What's the suggested model settings?

#3
by asher9972 - opened

Please suggest the preferred model settings (temperature, top_p, top_k ,.... ) for coding, non-coding-agentic workflows, creative writing
Or at least coding and no-coding agentic workflows.

I'm hosting this model on vllm dual rtx 6000 and get ~ 10% tool call failures with temp=1.0 top_p 0.95
read somewhere that i should try it with 1.0 and 0.95...

sometimes it also repeats the same sentence endless loop

also there seems to be a problem that sometimes the final answer is rendered in thinking block not as answer

It seems using this model with opencode works really really well… using it in openwebui it often gets stuck in thinking

are you using marlin to get this running ? i wasnt able to get it work on dual rtx 6000.

StepFun org

For our NVFP4 validation runs, we used temperature=1.0 and top_p=1.0. We did not explicitly set top_k, so it was left to the serving backend default / no additional top-k filtering in our evaluation requests.

The MTP setting was separate from sampling:
--speculative-config '{"method": "mtp", "num_speculative_tokens": 3}'

I haven't tested on rtx pro 6k yet, will look into it.

Sign up or log in to comment