Question: best quantization level for quality retention?

#51
by 3morixd - opened

We've been testing different quantization levels on our phone farm. Q4_K_M seems like the sweet spot, but some models degrade more than others.

Question: what quantization level do you recommend for this model? Has anyone compared Q4 vs Q5 vs Q6?

We find ~85% quality retention at Q4_K_M for most models, but code/math tasks sometimes need Q5+.

  • Dispatch AI (FZE), Sharjah UAE

Sign up or log in to comment