Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx

Model Comparison: Cognitive Abilities Analysis

Performance Summary

Benchmark	 Baseline	Huihui 	Difference
arc_challenge	0.393	0.375	-4.6%
arc_easy		0.466	0.446	-4.3%
boolq			0.751	0.643	-14.4%
hellaswag		0.648	0.591	-8.8%
openbookqa		0.366	0.340	-7.1%
piqa			0.776	0.751	-3.2%
winogrande		0.667	0.600	-10.5%
Average			0.543	0.512	-9.0%

Key Observations

Consistent degradation: The Huihui version performs worse across all cognitive benchmarks

Most impacted areas:

Boolean reasoning (boolq): 14.4% drop
Pronoun resolution (winogrande): 10.5% drop
Commonsense reasoning (hellaswag): 8.8% drop

Relatively preserved abilities:

Image-related reasoning (piqa): 3.2% drop
Basic question answering: 4-7% range

Interpretation

The uncensoring process appears to have degraded general cognitive abilities by approximately 9% on average. This suggests the safety filters may have been correlated with general reasoning capabilities, or that the fine-tuning process inadvertently harmed model quality.

The largest drops in boolq and winogrande indicate potential issues with logical reasoning and contextual understanding, which are fundamental cognitive skills.

Reviewed with Qwen3-30B-A3B-Thinking-2507-Claude-4.5-Sonnet-High-Reasoning-Distill-qx86x-hi-mlx

This model Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx was converted to MLX format from huihui-ai/Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated using mlx-lm version 0.28.4.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)