Ventera-MN (Abliterated Mistral-Nemo 12B)

Ventera-MN is a dynamically uncensored and abliterated version of mistralai/Mistral-Nemo-Instruct-2407, the flagship 12-billion parameter model built jointly by Mistral AI and NVIDIA.

This model was created using the Heretic framework, employing advanced orthogonal weight ablation to isolate and remove refusal vectors. The result is a highly capable, completely unchained logic engine that retains the original model's massive 128,000 token context window.

Ablation Telemetry & Metrics

Unlike traditional fine-tuning or full RLHF—which can cause "brain damage" to a model by catastrophically forgetting knowledge—Ventera-MN was optimized using a Pareto-optimal search across the model's residual stream specifically targeting the compliance and refusal mechanics.

Ablation Telemetry (Trial 35):

Base Model Refusals: 88 / 100
Ventera-MN Refusals: 10 / 100
KL Divergence: 0.0938

By removing almost 90% of the instruct guardrails while maintaining a KL divergence under 0.1, the structural integrity, language comprehension, and long-context logic capabilities of the base model are perfectly intact. It simply no longer refuses instructions.

Key Features

Massive 128k Context Window: Capable of ingesting entire books, codebases, or extended conversational histories in a single prompt without triggering safety filters.
Dense Architecture: A highly efficient 12B parameter dense model optimized to fit seamlessly into consumer GPUs (fits in 24GB VRAM at FP16, or much less when quantized).
Multilingual Mastery: Retains Mistral-Nemo's deep understanding of multiple languages.
Drop-in Replacement: Fully compatible with standard HuggingFace transformers and vLLM pipelines.

Usage

Via HuggingFace Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Umranz/Ventera-MN"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

⚠️ Limitations & Ethical Considerations

Because this model has had its safety guardrails mathematically ablated, it is highly compliant and will attempt to answer any prompt given to it.

Unrestricted Output: The model will not refuse requests, including those that may generate offensive, dangerous, or highly regulated content.
Hallucinations: As with all LLMs, the model can confidently hallucinate incorrect information, especially over extremely long context windows.
Use Case: This model is intended for research, creative writing, and local deployments where unrestricted inference is required. Users are solely responsible for the content generated.