This quantized version of ConicCat/GLM-4.7-Architect-355B-A32B was made for ik_llama.cpp and is sized for 128GB/24GB setups and a 40k context size, though it could be used with a 16GB card with a much smaller context size.
The model was quantized like the ubergarm GLM-4.7 one of the same size
ik_llama/llama-server.exe --model GLM-4.7-Architect-355B-A32B-smol-IQ2_KS-00001-of-00003.gguf --alias GLM-4.7-Architect-GGUF -ctk q8_0 -ctv q8_0 -khad -c 40960 --parallel 1 -ngl 99 --cpu-moe --no-mmap --merge-qkv --threads 10 -ub 4096 -b 4096
- Downloads last month
- 32
Hardware compatibility
Log In to add your hardware
2-bit