Instructions to use JANGQ-AI/MiniMax-M2.7-JANGTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use JANGQ-AI/MiniMax-M2.7-JANGTQ with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("JANGQ-AI/MiniMax-M2.7-JANGTQ")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use JANGQ-AI/MiniMax-M2.7-JANGTQ with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.7-JANGTQ"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "JANGQ-AI/MiniMax-M2.7-JANGTQ"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use JANGQ-AI/MiniMax-M2.7-JANGTQ with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.7-JANGTQ"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default JANGQ-AI/MiniMax-M2.7-JANGTQ

Run Hermes

hermes

MLX LM

How to use JANGQ-AI/MiniMax-M2.7-JANGTQ with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "JANGQ-AI/MiniMax-M2.7-JANGTQ"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "JANGQ-AI/MiniMax-M2.7-JANGTQ"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "JANGQ-AI/MiniMax-M2.7-JANGTQ",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

MiniMax-M2.7-JANGTQ

MiniMax M2.7 — 47 GB on disk (down from the ~230 GB FP8 source) — 2-bit JANGTQ2 quantization in JANGTQ-PRESTACK layout (pre-stacked routed experts on disk → instant cold load, no runtime cache sidecar).

Source: MiniMaxAI/MiniMax-M2.7 (MiniMax M2 architecture, FP8 E4M3 block-128 native, 196K context, 62 layers, 256 routed experts top-8)
Quantization: JANGTQ2 — 2-bit MXTQ codebook (Hadamard-rotated, Lloyd-Max optimized) on routed-expert weights + 8-bit affine on attention / shared expert / embed / lm_head + fp16 passthrough on RMSNorms / router gate / expert_bias
Routed-expert layout: pre-stacked along axis 0 (block_sparse_moe.switch_mlp.<proj>.tq_packed shape [256, out, packed_in]) per the JANGTQ-PRESTACK STANDARD — no runtime restacking, no jangtq_stacked.safetensors sidecar
Bundle size: 47 GB on-disk across 51 shards
Runs on: M3 Max 96 GB+ / M4 Max 128 GB / M5 Max 128 GB / Mac Studio

What's new in this build (2026-05-04)

This bundle is shipped in JANGTQ-PRESTACK layout — the routed-expert TurboQuant tensors are stacked along axis 0 directly in the main shards. Wins vs the previous per-expert layout:

Metric	Old (per-expert)	This (pre-stacked)
First-load time	~5-10s restacking pass	`mx.load()` direct (~14 s incl warmup)
Decode tok/s	reference	identical (same MXTQ codec, same fused decode kernels)
Bundle size	~57 GB	~47 GB (smaller by virtue of removing per-expert metadata duplication)
Loader path	streaming hydrate + per-expert restack	generic loader's prestack branch

What's in the bundle

Module	Source dtype	Bundle dtype
Routed experts (256 × 3 mats × 62 layers, pre-stacked along axis 0)	FP8 E4M3 + F32 weight_scale_inv	2-bit MXTQ + sidecar codebook
Attention (q/k/v/o, q/k norms)	FP8 E4M3 / BF16	8-bit affine g=64
`embed_tokens` / `lm_head`	BF16	8-bit affine g=64
RMSNorm / router gate / `e_score_correction_bias`	BF16 / F32	fp16 / fp32 passthrough

jangtq_runtime.safetensors sidecar (~25 KB) for Swift runtimes — covers (in_features={1536, 3072}, seed=42, bits=2) codebooks + sign-flip vectors.

Loading (Python)

pip install jang-tools mlx-lm

from jang_tools.load_jangtq import load_jangtq_model
model, tokenizer = load_jangtq_model("JANGQ-AI/MiniMax-M2.7-JANGTQ")

The loader detects the pre-stacked layout via jang_config.routed_expert_layout == "prestacked" and routes through the generic JANGTQ loader's prestack branch. Decode applies the standard SwitchGLU fused gate+up + P15 router compile + P18 QKV fusion patches automatically.

Reasoning + tools

Reasoning parser: qwen3 (extracts <think>...</think> blocks)
Tool parser: minimax
Default mode: thinking ON (chat template opens <think> for the assistant); pass enable_thinking=False to skip reasoning
Cache: kv (standard MLA-free MoE attention cache)

Credits

Quantization + MLX runtime: Jinho Jang (eric@jangq.ai)
Base model: MiniMaxAI — M2.7 architecture

Downloads last month: 6,969

Safetensors

Model size

15B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Model tree for JANGQ-AI/MiniMax-M2.7-JANGTQ

Base model

MiniMaxAI/MiniMax-M2.7

Quantized

(109)

this model

Collection including JANGQ-AI/MiniMax-M2.7-JANGTQ

JANG TurboQuantized Models

Collection

Using TurboQuant as a Quantization method • 10 items • Updated 21 days ago • 7