This quantized version of ConicCat/GLM-4.7-Architect-355B-A32B was made for ik_llama.cpp and is sized for 128GB/24GB setups and a 40k context size, though it could be used with a 16GB card with a much smaller context size.

The model was quantized like the ubergarm GLM-4.7 one of the same size

ik_llama/llama-server.exe --model GLM-4.7-Architect-355B-A32B-smol-IQ2_KS-00001-of-00003.gguf --alias GLM-4.7-Architect-GGUF -ctk q8_0 -ctv q8_0 -khad -c 40960 --parallel 1 -ngl 99 --cpu-moe --no-mmap --merge-qkv --threads 10 -ub 4096 -b 4096

Downloads last month
32
GGUF
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Uninformed/GLM-4.7-Architect-355B-A32B-GGUF

Base model

zai-org/GLM-4.7
Quantized
(1)
this model