OmniASR-CTC-1B (GGUF)

GGUF conversion of Facebook's omniASR-CTC-1B for use with CrispASR.

Model Details

  • Architecture: wav2vec2 encoder (48L, d=1280) + CTC head
  • Parameters: ~1B
  • Encoder: 7-layer CNN frontend (320x downsampling) + 48L transformer with grouped positional convolution
  • Output: CTC greedy decoding (character-level SentencePiece, 9812 tokens)
  • Languages: 1100+ (multilingual, trained on 3.6M hours)
  • License: Apache 2.0
  • Input: Raw 16 kHz mono PCM (no mel features)

Usage with CrispASR

# Auto-detected from GGUF metadata (omniasr-ctc arch)
crispasr --backend omniasr -m omniasr-ctc-1b-q4_k.gguf -f audio.wav

# With language specification
crispasr --backend omniasr -m omniasr-ctc-1b-q4_k.gguf -l de -f audio.wav

Available Files

File Quant Size Description
omniasr-ctc-1b.gguf F16 1.9 GB Full precision
omniasr-ctc-1b-q4_k.gguf Q4_K 551 MB Recommended โ€” good balance of quality and size

Conversion

Converted using:

python models/convert-omniasr-ctc-to-gguf.py \
  --input facebook/omniASR-CTC-1B \
  --output omniasr-ctc-1b.gguf

Quantized with:

crispasr-quantize omniasr-ctc-1b.gguf omniasr-ctc-1b-q4_k.gguf Q4_K

Comparison with OmniASR-CTC-300M

Model Params Size (Q4_K) Accuracy
OmniASR-CTC-300M 300M 157 MB Good
OmniASR-CTC-1B 1B 551 MB Better (deeper encoder)

Both use the same CTC architecture โ€” the 1B variant has 48 encoder layers (vs 24) and wider hidden dimension (1280 vs 1024).

Original Model

Downloads last month
200
GGUF
Model size
1.0B params
Architecture
omniasr-ctc
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/omniASR-CTC-1B-GGUF

Quantized
(1)
this model

Paper for cstr/omniASR-CTC-1B-GGUF