Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx
Model Comparison: Cognitive Abilities Analysis
Performance Summary
Benchmark Baseline Huihui Difference
arc_challenge 0.393 0.375 -4.6%
arc_easy 0.466 0.446 -4.3%
boolq 0.751 0.643 -14.4%
hellaswag 0.648 0.591 -8.8%
openbookqa 0.366 0.340 -7.1%
piqa 0.776 0.751 -3.2%
winogrande 0.667 0.600 -10.5%
Average 0.543 0.512 -9.0%
Key Observations
Consistent degradation: The Huihui version performs worse across all cognitive benchmarks
Most impacted areas:
- Boolean reasoning (boolq): 14.4% drop
- Pronoun resolution (winogrande): 10.5% drop
- Commonsense reasoning (hellaswag): 8.8% drop
Relatively preserved abilities:
- Image-related reasoning (piqa): 3.2% drop
- Basic question answering: 4-7% range
Interpretation
The uncensoring process appears to have degraded general cognitive abilities by approximately 9% on average. This suggests the safety filters may have been correlated with general reasoning capabilities, or that the fine-tuning process inadvertently harmed model quality.
The largest drops in boolq and winogrande indicate potential issues with logical reasoning and contextual understanding, which are fundamental cognitive skills.
Reviewed with Qwen3-30B-A3B-Thinking-2507-Claude-4.5-Sonnet-High-Reasoning-Distill-qx86x-hi-mlx
This model Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx was converted to MLX format from huihui-ai/Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated using mlx-lm version 0.28.4.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 199
Model tree for nightmedia/Huihui-Qwen3-VL-30B-A3B-Thinking-abliterated-qx86-hi-mlx
Base model
Qwen/Qwen3-VL-30B-A3B-Thinking