How does IQ4_XS compare to UD-Q3_K_XL?
Does it have a noticeable intelligence increase?
Maybe it's just me but I often have bad luck with unsloth's quants. Anyway I found this benchmark (no KLD or PPL, just "accuracy") https://unsloth.ai/docs/models/minimax-m27#gguf-benchmarks
I haven't had time to play with this model outside of quanting and PPL / KLD testing it to make sure it's sane, but given that the quant recipes I use keep the majority of the model in Q8_0 with only the ffn_gate / ffn_up / ffn_down tensors being quantized down, I'd suggest that mine are probably a little better on KLD and probably do better at long context since I keep the attention and other non-expert tensors in high quality.
You'd have to do a downstream bench to verify though, I might add some of that to my testing suite in the future in addition to just KLD but for now you'll have to test them side-by-side. I'm sure other people will also be interested to know!
