Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

compressed-tensors

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

4,402

Full-text search

Active filters: compressed-tensors

moonshotai/Kimi-K2.5

Image-Text-to-Text • 171B • Updated 14 days ago • 895k • • 2.24k

inclusionAI/Ring-2.5-1T

Text Generation • 1T • Updated 4 days ago • 3.59k • 201

mratsim/MiniMax-M2.5-BF16-INT4-AWQ

Text Generation • 39B • Updated 1 day ago • 11.5k • 19

moonshotai/Kimi-K2-Thinking

Text Generation • Updated 20 days ago • 315k • • 1.67k

mratsim/MiniMax-M2.5-FP8-INT4-AWQ

Text Generation • 39B • Updated 1 day ago • 45 • 5

unsloth/Qwen3-Coder-Next-FP8-Dynamic

Text Generation • 80B • Updated 15 days ago • 45.1k • 33

embedl/Cosmos-Reason2-2B-W4A16

Image-Text-to-Text • 1B • Updated 1 day ago • 298 • 4

tacos4me/Step-3.5-Flash-NVFP4

Text Generation • 111B • Updated 6 days ago • 936 • 4

cyankiwi/MiniMax-M2.5-AWQ-4bit

Text Generation • 37B • Updated 3 days ago • 1.54k • 4

inference-net/ClipTagger-12b

Image-Text-to-Text • Updated Aug 14, 2025 • 25 • 58

mratsim/MiniMax-M2.1-FP8-INT4-AWQ

Text Generation • Updated Jan 14 • 5.2k • 39

GadflyII/GLM-4.6V-NVFP4

Image-Text-to-Text • 62B • Updated Jan 12 • 11.9k • 9

cyankiwi/GLM-4.7-Flash-AWQ-4bit

Text Generation • Updated 27 days ago • 259k • 43

unsloth/GLM-4.7-Flash-FP8-Dynamic

Text Generation • 30B • Updated 24 days ago • 110k • 24

bullpoint/Qwen3-Coder-Next-AWQ-4bit

Text Generation • 14B • Updated 15 days ago • 216k • 6

GadflyII/Qwen3-Coder-Next-NVFP4

Text Generation • Updated 15 days ago • 115k • 16

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

Image-Text-to-Text • 402B • Updated May 22, 2025 • 61.7k • • 153

allenai/olmOCR-2-7B-1025-FP8

Image-Text-to-Text • 8B • Updated Dec 9, 2025 • 212k • 197

RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4

Text Generation • 133B • Updated Dec 4, 2025 • 10.8k • 12

mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4

Updated 21 days ago • 3.38k • 49

mratsim/MiniMax-M2.1-BF16-INT4-AWQ

Text Generation • 39B • Updated Jan 14 • 4.49k • 7

GadflyII/GLM-4.7-Flash-NVFP4

Text Generation • Updated 29 days ago • 400k • 62

nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.0

133B • Updated 14 days ago • 25.9k • 3

EliasOenal/MiniMax-M2.5-Hybrid-AWQ-W4A16G128-Attn-fp8_e4m3-KV-fp8_e4m3

Text Generation • 34B • Updated 1 day ago • 9 • 2

ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g

Image-Text-to-Text • 5B • Updated Mar 20, 2025 • 12.5k • 44

RedHatAI/gemma-3-27b-it-quantized.w4a16

Image-Text-to-Text • 7B • Updated Jun 9, 2025 • 14.7k • 12

jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym

5B • Updated Jul 4, 2025 • 70.6k • 10

zai-org/GLM-4.5-FP8

Text Generation • Updated Aug 12, 2025 • 2.99k • 78

zai-org/GLM-4.5-Air-FP8

Text Generation • Updated Aug 12, 2025 • 36.3k • • 80

dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16_ASYM

Text Ranking • 0.9B • Updated Aug 23, 2025 • 381 • 1