This quantized version of ConicCat/GLM-4.7-Architect-355B-A32B was made for ik_llama.cpp and is sized for 128GB/24GB setups and a 40k context size, though it could be used with a 16GB card with a much smaller context size.

The model was quantized like the ubergarm GLM-4.7 one of the same size

ik_llama/llama-server.exe --model GLM-4.7-Architect-355B-A32B-smol-IQ2_KS-00001-of-00003.gguf --alias GLM-4.7-Architect-GGUF -ctk q8_0 -ctv q8_0 -khad -c 40960 --parallel 1 -ngl 99 --cpu-moe --no-mmap --merge-qkv --threads 10 -ub 4096 -b 4096

Downloads last month: 32

GGUF

Hardware compatibility

2-bit

Model tree for Uninformed/GLM-4.7-Architect-355B-A32B-GGUF

Base model

zai-org/GLM-4.7

Finetuned

ConicCat/GLM-4.7-Architect-355B-A32B

Quantized

(1)

this model