GGUF Version

#3
by vistalba - opened

Is someone already working on a GGUF Q4 version?

Swiss AI Initiative org

yes we could use a bit of help to get a llama.cpp version, which should be not too hard to do.

if more people want to try, please do keep us posted here

I would be interested in helping to test.

I am not familiar with it. I just tried the basic thing found on the internet but ran into the following error:

docker run --rm -v "./":/repo ghcr.io/ggml-org/llama.cpp:full --convert "/repo" --outtype f16
INFO:hf-to-gguf:Loading model: repo
WARNING:hf-to-gguf:Failed to load model config from /repo: The checkpoint you are trying to load has model type `apertus` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: ApertusForCausalLM
ERROR:hf-to-gguf:Model ApertusForCausalLM is not supported

Hello @vistalba , to be able to use the model with transformers, you need at least the version 4.56.0. You can install it: pip install transformers==4.56.0 or simply install the latest version.

@nathanrchn considering the model is not standard working on this would really make this model more accessible to developers and builders. No point in a more ethical model if it's not accessible. Also yet another different chat template will make this a nightmare for inference providers to add.
https://github.com/ggml-org/llama.cpp/issues/15748

@danielhanchen from Unsloth.. would love your help here too

Hello,

Usually i generate .gguf files and quantisation with llama.cpp
I did upgrade all requirements including transformers updated to 4.56.0 and rebuild complete llama.cpp but it does not work.
So i have opened issue https://github.com/ggml-org/llama.cpp/issues/15751 which is probably also covered by https://github.com/ggml-org/llama.cpp/issues/15748
Once this is fixed we can probably generate .gguf files and quantisation
Usually i use convert_hf_to_gguf.py from llama.cpp project to generate .gguf format

++ & looking, tracking infos about this thanks for all contribs and efforts done on that side.

Hello everyone,

While you can get the GGUF files for the weights, the activation function is xielu and is was not implemented in ollama or llama.cpp at release. I believe there has now been a community implementation (https://github.com/foldl/chatllm.cpp/blob/master/models/apertus.cpp), but it is not yet merged into main serving providers.

Swiss AI Initiative org

the adjustments on GGUF should be ready now and be merged into official llama.cpp every soon. you can use it from this pull request already, and have a look when it will get merged to next release:
https://github.com/ggml-org/llama.cpp/pull/15852

mjaggi changed discussion status to closed

Sign up or log in to comment