Instructions to use TendieLabs/Capybara-31B-GGUFS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TendieLabs/Capybara-31B-GGUFS with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TendieLabs/Capybara-31B-GGUFS",
	filename="Capybara-31B-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use TendieLabs/Capybara-31B-GGUFS with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Use Docker

docker model run hf.co/TendieLabs/Capybara-31B-GGUFS:Q4_K_M

LM Studio
Jan

vLLM

How to use TendieLabs/Capybara-31B-GGUFS with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TendieLabs/Capybara-31B-GGUFS"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TendieLabs/Capybara-31B-GGUFS",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Ollama
How to use TendieLabs/Capybara-31B-GGUFS with Ollama:
```
ollama run hf.co/TendieLabs/Capybara-31B-GGUFS:Q4_K_M
```

Unsloth Studio new

How to use TendieLabs/Capybara-31B-GGUFS with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TendieLabs/Capybara-31B-GGUFS to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TendieLabs/Capybara-31B-GGUFS to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TendieLabs/Capybara-31B-GGUFS to start chatting

Pi new

How to use TendieLabs/Capybara-31B-GGUFS with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "TendieLabs/Capybara-31B-GGUFS:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use TendieLabs/Capybara-31B-GGUFS with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use TendieLabs/Capybara-31B-GGUFS with Docker Model Runner:
```
docker model run hf.co/TendieLabs/Capybara-31B-GGUFS:Q4_K_M
```

Lemonade

How to use TendieLabs/Capybara-31B-GGUFS with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TendieLabs/Capybara-31B-GGUFS:Q4_K_M

Run and chat with the model

lemonade run user.Capybara-31B-GGUFS-Q4_K_M

List all available models

lemonade list

Capybara-31B

Beta / WIP. This is an experimental release made to validate the fine-tune process and test behavior on real hardware. It is not a production-ready model. Expect rough edges, and treat evaluation results as preliminary.

TendieLabs/Capybara-31B is a fine-tuned version of google/gemma-4-31B-it, trained to be a better local orchestrator and assistant. The primary goal was not to maximize raw code generation but to produce a model that reasons well, communicates clearly, knows when to delegate, and stays honest under pressure.

GGUF variants: TendieLabs/Capybara-31B-GGUFS

Model Description

Capybara-31B is built for the front-desk orchestrator role in a multi-agent setup. It handles ordinary requests, summarizes messy context, routes and decomposes tasks, and delegates complex implementation to specialist agents. The personality target was Claude Sonnet, prioritizing directness, structure, and honesty over verbose performance.

This is not a coding model. It is an assistant model with sharpened coding judgment. The distinction matters: it should analyze and review code well, but it should route heavy implementation work instead of attempting it alone.

Property	Value
Base model	`google/gemma-4-31B-it`
Model family	Gemma 4 (dense)
Fine-tune method	QLoRA (LoRA over 4-bit base)
Context window	2048 tokens (first run, conservative)
Primary role	Local orchestrator / front-desk assistant

Intended Use

Good fits:

Answering general assistant requests clearly and concisely
Summarizing messy notes, project context, or requirements
Decomposing tasks and routing work to appropriate specialists
Code review, debugging analysis, and implementation advice
Handling ambiguity by asking one focused clarifying question instead of guessing

Not intended for:

Autonomous multi-file repo editing
Large-scale code generation without a specialist downstream
Replacing a dedicated coding model for implementation-heavy tasks

Training Details

Dataset Mix

The training mix was weighted toward assistant behavior, routing, and summarization rather than code generation.

Source	Role	Weight
Crownelius/Opus-4.6-Reasoning-3300x	Reasoning quality, structure, helpfulness	18%
Crownelius/High-Coder-Reasoning-Multi-Turn	Debugging judgment, code analysis, multi-turn	18%
microsoft/rStar-Coder	Harder reasoning and coding tasks	15%
Custom routing / delegation set	Front-desk routing behavior	15%
NickyNicky/Code-290k (filtered)	Code competence floor	10%
Crownelius/Opus-4.5-Writing-Style-formatted	Tone and personality shaping	10%
Custom summarization / context digestion set	Project-note compression, task extraction	10%
Crownelius/GLM-5.0-25000x (filtered)	General reasoning filler	4%

Total training rows: 10K to 20K high-signal examples.

Training Configuration

Hyperparameter	Value
Method	QLoRA
LoRA rank	16-32
LoRA alpha	32-64
Dropout	0.0-0.05
Learning rate	1e-5 to 2e-5
LR scheduler	Cosine
Warmup ratio	2-3%
Epochs	1
Sequence length	2048
Batch size	1 (gradient accumulation 4)
Gradient checkpointing	Unsloth

Hardware Requirements

Capybara-31B was developed and validated on an RTX 3090 (24 GB VRAM). At IQ4_XS quantization the model leaves approximately 3 GB of VRAM free on that card, making it a practical local-first deployment for a single consumer GPU.

Quant	Approx VRAM	Recommended for
IQ4_XS	~21 GB	RTX 3090, 4090, single-GPU setups
Q4_K_M	~22 GB	RTX 3090, 4090
Q5_K_M	~24 GB	24 GB cards (tight)
Q8_0	~34 GB	Dual-GPU or large VRAM server
F16	~62 GB	Server-grade hardware

GGUF Variants

Available at TendieLabs/Capybara-31B-GGUFS:

IQ4_XS (recommended starting point)
IQ4_NL
Q4_0, Q4_1, Q4_K_S, Q4_K_M
Q5_0, Q5_1, Q5_K_S, Q5_K_M
Q6_K
Q8_0, Q8_1
F16

Evaluation

The model was evaluated across the following dimensions before release:

Delegation accuracy: does it route implementation-heavy work correctly instead of attempting it?
Honesty under uncertainty: does it admit when context is missing rather than hallucinating answers?
Long-context summarization: does it compress messy project notes into useful summaries?
Code review quality: does it identify real issues, risks, and next steps?
Tone: does the output feel like a capable, direct assistant rather than a verbose language model?

The key failure mode being screened against: improved tone alongside degraded judgment and worse delegation behavior.

Limitations

First-run adapter. Behavior targets are correct but some edge cases may need refinement in future versions.
Sequence length was kept conservative (2048). Long-document tasks may need to be chunked.
Gemma 4 tooling was relatively new at training time. Some export or serving quirks may apply depending on your inference stack.
Not designed for multimodal tasks despite the Gemma 4 family's vision capabilities. Text-only fine-tune.

License

This model is derived from google/gemma-4-31B-it and is released under the Gemma Terms of Use. Usage is subject to those terms.

Downloads last month: 187

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for TendieLabs/Capybara-31B-GGUFS

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it

Adapter

(98)

this model

TendieLabs
/

Capybara-31B-GGUFS