Turn Detector Qwen3-4B

Fine-tuned Qwen3-4B for real-time turn-end detection in multilingual call center conversations.

The model predicts P(<|im_end|>) — the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond.

How It Works

Given a conversation so far, the model outputs the probability of <|im_end|> as the next token:

P(im_end) > 0.5 → speaker is done talking (turn complete)
P(im_end) < 0.5 → speaker is still talking (turn incomplete)

Usage

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Scicom-intl/turn-detector-Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
).cuda().eval()

IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>")

def get_turn_end_prob(text):
    if text.endswith("<|im_end|>"):
        text = text[:-len("<|im_end|>")]
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        logits = model(**inputs).logits
    prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item()
    return prob

Eval Results

Test set: 238 synthetic samples (119 positive + 119 negative), 12 language pairs

Overall (threshold = 0.5)

Metric	Score
Accuracy	88.24%
Precision	100.00%
Recall	76.47%
F1	86.67%

Per Class

Class	Score
Positive (turn complete)	76.47%
Negative (turn incomplete)	100.00%

Per Language

Language Pair	Overall	Positive	Negative
chinese-english	90.00%	80.00%	100.00%
chinese-malay	85.00%	70.00%	100.00%
chinese-tamil	100.00%	100.00%	100.00%
english-chinese	80.00%	60.00%	100.00%
english-malay	90.00%	80.00%	100.00%
english-tamil	90.00%	80.00%	100.00%
malay-chinese	100.00%	100.00%	100.00%
malay-english	100.00%	100.00%	100.00%
malay-tamil	100.00%	100.00%	100.00%
tamil-chinese	88.89%	77.78%	100.00%
tamil-english	65.00%	30.00%	100.00%
tamil-malay	70.00%	40.00%	100.00%

Threshold Sweep

Threshold	Accuracy	Precision	Recall	F1
0.1	95.38%	100.00%	90.76%	95.15%
0.2	92.44%	100.00%	84.87%	91.82%
0.3	90.76%	100.00%	81.51%	89.81%
0.4	89.92%	100.00%	79.83%	88.79%
0.5	88.24%	100.00%	76.47%	86.67%
0.6	86.97%	100.00%	73.95%	85.02%
0.7	84.87%	100.00%	69.75%	82.18%
0.8	81.09%	100.00%	62.18%	76.68%
0.9	75.63%	100.00%	51.26%	67.78%

Probability Distribution

Class	Mean	Median	Min	Max
Positive	0.7313	0.9114	0.0000	0.9998
Negative	0.0000	0.0000	0.0000	0.0000

Dataset

Train: positive samples only (complete conversations ending with <|im_end|>)
Evaluation: Scicom-intl/Evaluation-Malaysian-Turn-Detector

Training

Base model: Qwen/Qwen3-4B
Training data: Positive samples only (complete conversations)
Loss: Liger Fused Linear Cross Entropy
Attention: FA4
Precision: bfloat16
Block size: 8192 (multipacked)
Batch size: 2 x 16 gradient accumulation
Learning rate: 2e-5 (constant)
Epochs: 1

Training Data Sources

Dataset	Source
Call Center Language Switching	https://huggingface.co/datasets/Scicom-intl/Call-Center-Language-Switching
Function Call	https://huggingface.co/datasets/Scicom-intl/Function-Call
Malaysian Multiturn Chat Assistant	https://huggingface.co/datasets/mesolitica/Malaysian-Multiturn-Chat-Assistant
Malaysian Speech Instructions	https://huggingface.co/datasets/mesolitica/Malaysian-Speech-Instructions

---a

Downloads last month: 671

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Scicom-intl/turn-detector-Qwen3-4B

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(538)

this model