Question: best quantization level for quality retention?

#51

by 3morixd - opened 3 days ago

We've been testing different quantization levels on our phone farm. Q4_K_M seems like the sweet spot, but some models degrade more than others.

Question: what quantization level do you recommend for this model? Has anyone compared Q4 vs Q5 vs Q6?

We find ~85% quality retention at Q4_K_M for most models, but code/math tasks sometimes need Q5+.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment