No time for Quantization uploads
Unfortunately I have too many models to upload, my already slow internet keeps cutting out, and the HF command uploader keeps freezing. No space to even test new merges until all these uploads finish.
So I won't be able to upload GGUFs for now because of this. Morbid Aether X, Mergedonia Prometheus, Oarfish v1.2 etc are all functional so hopefully others like @mradermacher are able to help quantize them.
Update
I'll try to at least upload IQ4_XS when possible.
I'm not familiar enough with tools or the whole thing to convert safetensors to gguf (or train or make loras, etc); But if i have a gguf i can quantize down with LlamaCPP tools and upload those models.
Probably 32B and smaller, probably write a batch file. Would still take a while to upload, assuming my internet doesn't crash like it likes to do sometimes.
I don't know what OS you have but on Windows you can just extract llama.cpp to a folder and then run this command to convert safetensors directly to Q8_0
python C:\Quanter\llama.cpp\G4\llama.cpp-master\convert_hf_to_gguf.py B:\26B\moe_karcher1.2 --outfile B:\26B\moe_karcher1.2\Runic-Oarfish-26B-A4B-v1.2-Q8_0.gguf --outtype q8_0
For standard quants you first convert to BF16 first like this, and then the lower sizes.
python C:\Quanter\llama.cpp\convert_hf_to_gguf.py B:\26B\moe_karcher1.2 --outfile B:\26B\moe_karcher1.2\input.gguf --outtype bf16
C:\Quanter\llama.cpp\llama-quantize B:\26B\moe_karcher1.2\input.gguf B:\26B\moe_karcher1.2\Runic-Oarfish-26B-A4B-v1.2-Q4_0.gguf Q4_0
C:\Quanter\llama.cpp\llama-quantize B:\26B\moe_karcher1.2\input.gguf B:\26B\moe_karcher1.2\Runic-Oarfish-26B-A4B-v1.2-Q4_K_M.gguf Q4_K_M
C:\Quanter\llama.cpp\llama-quantize B:\26B\moe_karcher1.2\input.gguf B:\26B\moe_karcher1.2\Runic-Oarfish-26B-A4B-v1.2-Q5_K_M.gguf Q5_K_M
C:\Quanter\llama.cpp\llama-quantize B:\26B\moe_karcher1.2\input.gguf B:\26B\moe_karcher1.2\Runic-Oarfish-26B-A4B-v1.2-Q6_K.gguf Q6_K
For imatrix quants, you would need to run llama-imatrix to generate the dat file, which takes a long time on slower PC.
Note that IQ4_XS is special because you can make it with or without imatrix, but all other IQ quants besides that require imatrix.dat. Generally once you go past Q4 the imatrix makes no difference.
llama-imatrix -m input.gguf -f illuminati_imatrix_v1.txt -o asmodeus-24b-v2_illuminati_imatrix_v1.dat
llama-quantize --imatrix asmodeus-24b-v2_illuminati_imatrix_v1.dat input.gguf B:\24B\DarkArtsForge__Asmodeus-24B-v2\Asmodeus-24B-v2-IQ4_NL.gguf IQ4_NL
llama-quantize --imatrix asmodeus-24b-v2_illuminati_imatrix_v1.dat input.gguf B:\24B\DarkArtsForge__Asmodeus-24B-v2\Asmodeus-24B-v2-IQ3_M.gguf IQ3_M
I don't know what OS you have but on Windows you can just extract llama.cpp to a folder and then run this command to convert safetensors directly to Q8_0
Windows mixed with Cygwin (and i do almost everything from cygwin). Python for some reason is a language like C++ that irks me in it's setup and use. But as long as i can call it i can get it to work. Python i've seen it more often than not balk when it can't import modules. No module named 'transformers' coming to mind, preventing me from converting safetensors in the past.
Looks like a nice quick and dirty tutorial. I've been using llama-quanitize, but that's about the only tool I've really touched so far.
llama-quantize moe_karcher1.2\input.gguf moe_karcher1.2\Runic-Oarfish-26B-A4B-v1.2-Q4_0.gguf Q4_0
Heh, i'd been looking up the codes and using them manually. I'll keep that in mind when i try to do the script to do the work.
llama-imatrix -m input.gguf -f illuminati_imatrix_v1.txt -o asmodeus-24b-v2_illuminati_imatrix_v1.dat
Gotcha... Assuming it doesn't get sniped from mradermacher, i can try and offer some quantized versions.
edit: Nope, can't seem to get it to load the script. And i have no clue how to fix that. Why can't there just be a exe file to use?
Windows mixed with Cygwin (and i do almost everything from cygwin). Python for some reason is a language like C++ that irks me in it's setup and use. But as long as i can call it i can get it to work. Python i've seen it more often than not balk when it can't import modules. No module named 'transformers' coming to mind, preventing me from converting safetensors in the past.
I'm not familier with Cygwin but I agree that Python is quite finnicky and hard to deal with. I have 3 seperate modules installed locally for it, and the main one I use, Python311, has 2 "archive copies" in a zip to restore from whenever there is widespread corruption/errors with libs. Otherwise, mergekit stops working or produces corrupted safetensors.
Now in order to support Gemma4 merges I've had to make a newer archive with several updated libs, this one includes a "hotswap" for the huggingface_hub library, one for merging, and one for download/uploading. So now I can't even run an upload script while merging, I have to upload through the browser directly if a merge is running locally. I wonder what will break next. 🤷
Gotcha... Assuming it doesn't get sniped from mradermacher, i can try and offer some quantized versions.
edit: Nope, can't seem to get it to load the script. And i have no clue how to fix that. Why can't there just be a exe file to use?
Did it report any errors when trying to load the script? Launch it from a terminal or command prompt window so you can see what it reports. From what I can tell it calls on the python script to quantize, but for imatrix it should launch llama-imatrix.exe directly. If it is silently failing without any errors, you might have downloaded the wrong version from github. Try the x64 if all else fails. But yeah a GUI would be nice for merging/quantizing.
Also I might have a way to quickly quantize and upload from runpod, I'll try to gather the notes and process them into a quick tutorial. This might be the fallback solution, for under 50 cents it can quickly generate and upload several quants instantly if you set it up right. I don't know what setup mradermacher uses but probably something similar that has great bandwidth.
I've been testing some merges on runpod as well over at @OrobasVault . The thing is I couldn't get Gemma 4 merges to work on runpod yet, only locally. But runpod is the better choice for really long iterative merge methods like karcher/flux/aether_x.
Did it report any errors when trying to load the script?
Yes it did and i already mentioned it. When trying to run convert_hf_to_gguf.py i get "No module named 'transformers'". While this is obviously an importing problem, i haven't the foggiest where to download the missing files, nor where to put them to make it work. And trying to run pip as instructed in the installation guide it tries to get numpy, then complains of a version number and barfs. I'd rather just have a pre-compiled exe and use that.
For quite a while now i've come to prefer packages that include everything. Extract and run, having to find or install missing components is a pain, and in some cases (like some linux games from humble bundle) outright impossible since they eventually decide to delete old packages they rely on.
I already can run llama_quanitize.exe just fine.
Since the IQ4_XS of Morbid-Aether-X-12B is only 6GB its uploading now.
But Oarfish v1.2 Q8_0 was 90% uploaded and failed, and I don't have time to try again.
I would recommend backing up your current python library just to be safe, and then delete/rename the folder that stores everything, and install all of these
https://huggingface.co/datasets/Naphula-Archives/master_python_list_mergekit_windows
there should be a way to turn that list into an auto-installer. it might take a while but from there you should be able to extract or install the latest versions of mergekit/llama.cpp
Or if you want to try the quick route, just install the versions of transformers/numpy/etc. from that list, only the ones it complains about. For me the biggest issues were with transformers, huggingface_hub, and pydantic.
**Note that this wont work for Gemma 4 models, I have to update the master list for it
maybe its possible to upload the entire python archive, if this is safe? it would be nice if everything mergekit and llama cpp needed were just built into it, such as exact python dependencies.
Not sure. I just know different versions of python are incompatible with eachother, and breaks a couple self-contained scripts i have specifically for dealing with... a couple odd archive types. Think I'm using Python 2.7.