Turn Detector Qwen3-4B

Fine-tuned Qwen3-4B for real-time turn-end detection in multilingual call center conversations.

The model predicts P(<|im_end|>) โ€” the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond.

How It Works

Given a conversation so far, the model outputs the probability of <|im_end|> as the next token:

  • P(im_end) > 0.5 โ†’ speaker is done talking (turn complete)
  • P(im_end) < 0.5 โ†’ speaker is still talking (turn incomplete)

Usage

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Scicom-intl/turn-detector-Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
).cuda().eval()

IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>")

def get_turn_end_prob(text):
    if text.endswith("<|im_end|>"):
        text = text[:-len("<|im_end|>")]
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        logits = model(**inputs).logits
    prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item()
    return prob

Eval Results

Test set: 238 synthetic samples (119 positive + 119 negative), 12 language pairs

Overall (threshold = 0.5)

Metric Score
Accuracy 88.24%
Precision 100.00%
Recall 76.47%
F1 86.67%

Per Class

Class Score
Positive (turn complete) 76.47%
Negative (turn incomplete) 100.00%

Per Language

Language Pair Overall Positive Negative
chinese-english 90.00% 80.00% 100.00%
chinese-malay 85.00% 70.00% 100.00%
chinese-tamil 100.00% 100.00% 100.00%
english-chinese 80.00% 60.00% 100.00%
english-malay 90.00% 80.00% 100.00%
english-tamil 90.00% 80.00% 100.00%
malay-chinese 100.00% 100.00% 100.00%
malay-english 100.00% 100.00% 100.00%
malay-tamil 100.00% 100.00% 100.00%
tamil-chinese 88.89% 77.78% 100.00%
tamil-english 65.00% 30.00% 100.00%
tamil-malay 70.00% 40.00% 100.00%

Threshold Sweep

Threshold Accuracy Precision Recall F1
0.1 95.38% 100.00% 90.76% 95.15%
0.2 92.44% 100.00% 84.87% 91.82%
0.3 90.76% 100.00% 81.51% 89.81%
0.4 89.92% 100.00% 79.83% 88.79%
0.5 88.24% 100.00% 76.47% 86.67%
0.6 86.97% 100.00% 73.95% 85.02%
0.7 84.87% 100.00% 69.75% 82.18%
0.8 81.09% 100.00% 62.18% 76.68%
0.9 75.63% 100.00% 51.26% 67.78%

Probability Distribution

Class Mean Median Min Max
Positive 0.7313 0.9114 0.0000 0.9998
Negative 0.0000 0.0000 0.0000 0.0000

Dataset


Training

  • Base model: Qwen/Qwen3-4B
  • Training data: Positive samples only (complete conversations)
  • Loss: Liger Fused Linear Cross Entropy
  • Attention: FA4
  • Precision: bfloat16
  • Block size: 8192 (multipacked)
  • Batch size: 2 x 16 gradient accumulation
  • Learning rate: 2e-5 (constant)
  • Epochs: 1

Training Data Sources

---a

Downloads last month
671
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Scicom-intl/turn-detector-Qwen3-4B

Finetuned
Qwen/Qwen3-4B
Finetuned
(538)
this model