3 format of quantization:

Converted the safetensors to GGUF for inference in CPU using llama_cpp

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support