GGUF Version

by vistalba - opened Sep 2, 2025

Discussion

vistalba

Sep 2, 2025

Is someone already working on a GGUF Q4 version?

mjaggi

Swiss AI Initiative org Sep 2, 2025

yes we could use a bit of help to get a llama.cpp version, which should be not too hard to do.

if more people want to try, please do keep us posted here

nabossha

Sep 2, 2025

I would be interested in helping to test.

vistalba

Sep 2, 2025

•

edited Sep 2, 2025

I am not familiar with it. I just tried the basic thing found on the internet but ran into the following error:

docker run --rm -v "./":/repo ghcr.io/ggml-org/llama.cpp:full --convert "/repo" --outtype f16
INFO:hf-to-gguf:Loading model: repo
WARNING:hf-to-gguf:Failed to load model config from /repo: The checkpoint you are trying to load has model type `apertus` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: ApertusForCausalLM
ERROR:hf-to-gguf:Model ApertusForCausalLM is not supported

nathanrchn

Sep 2, 2025

Hello @vistalba , to be able to use the model with transformers, you need at least the version 4.56.0. You can install it: pip install transformers==4.56.0 or simply install the latest version.

oliviermills

Sep 2, 2025

@nathanrchn considering the model is not standard working on this would really make this model more accessible to developers and builders. No point in a more ethical model if it's not accessible. Also yet another different chat template will make this a nightmare for inference providers to add.
https://github.com/ggml-org/llama.cpp/issues/15748

@danielhanchen from Unsloth.. would love your help here too

ClaudeStabile

Sep 3, 2025

Hello,

Usually i generate .gguf files and quantisation with llama.cpp
I did upgrade all requirements including transformers updated to 4.56.0 and rebuild complete llama.cpp but it does not work.
So i have opened issue https://github.com/ggml-org/llama.cpp/issues/15751 which is probably also covered by https://github.com/ggml-org/llama.cpp/issues/15748
Once this is fixed we can probably generate .gguf files and quantisation
Usually i use convert_hf_to_gguf.py from llama.cpp project to generate .gguf format

++ & looking, tracking infos about this thanks for all contribs and efforts done on that side.

achiffa

Sep 3, 2025

Hello everyone,

While you can get the GGUF files for the weights, the activation function is xielu and is was not implemented in ollama or llama.cpp at release. I believe there has now been a community implementation (https://github.com/foldl/chatllm.cpp/blob/master/models/apertus.cpp), but it is not yet merged into main serving providers.

mjaggi

Swiss AI Initiative org Sep 22, 2025

the adjustments on GGUF should be ready now and be merged into official llama.cpp every soon. you can use it from this pull request already, and have a look when it will get merged to next release:
https://github.com/ggml-org/llama.cpp/pull/15852

mjaggi changed discussion status to closed Sep 22, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment