NeuroBLAST-V3 0.6B SYNTH EC 144B TOKENS

⚠️ EXPERIMENTAL EARLY CHECKPOINT ⚠️

This is an Early Checkpoint (EC) of the NeuroBLAST V3 architecture, a novel hybrid model designed with a biologically inspired "cortical" structure.

This specific checkpoint represents the "decay" phase of training. It has been trained on longer contexts with a lower learning rate than the previous one and is intended for architectural evaluation and research purposes.

Model Details

Architecture: NeuroBLAST V3 (Custom Hybrid Architecture)
Checkpoint Step: 40,000
Parameters: 596,728,320
Num layers: 72
- Sensory layers: 24
- Associative layers: 32
- Motor layers: 16
Hidden size: 512
Vocab size: 65538
Intermediate size: 3072
Num attention heads: 16
Num kv heads: 8
Head dim: 128
Tie word embeddings: False

Architecture Highlights

NeuroBLAST differs from standard Transformers by utilizing a three-stage cortical design:

Sensory Cortex: Hybrid layers alternating between Attention and Dilated Causal 2D Convolutions.
Associative Cortex: Hybrid layers with alternating RoPE usage.
Motor Cortex: Pure Attention layers.
Deep Residual Bridges: Long-range residual connections injecting the original embeddings (and their negations) between cortical stages to improve signal propagation.

Training Details

This model is currently being trained using the Google TPU Research Cloud (TRC).

Dataset: PleIAs/SYNTH
Tokens Processed: ~144 Billion
Hardware: TPUv4-16
Training Time: ~13 Days
Effective Batch Size: 1024
Context Length: 2048 tokens (Current phase)
Learning rate: 2e-4
Weight decay: 0.01
Optimizer: AdamW
Precision: BFloat16
Current State: Decay phase

Usage

Note: You must use trust_remote_code=True as this model utilizes custom modeling code (modeling_neuroblast.py).

import torch
from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM

model_id = "mkurman/NeuroBLAST-V3-0.6M-SYNTH-EC-144B-TOK"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load the model with custom code trust
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map='cuda', 
    trust_remote_code=True
).eval()

streamer = TextStreamer(
    tokenizer, skip_prompt=False, decode_kwargs={"skip_special_tokens": False}
)

# Prepare input
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": "what is hypertension?"}], 
    tokenize=True, 
    return_tensors="pt", 
    add_generation_prompt=True
)

print(f"Input IDs: {input_ids}")

# Generate
with torch.no_grad():
    outputs = model.generate(
        input_ids=input_ids.to(model.device),
        max_new_tokens=128,
        streamer=streamer,
        use_cache=True,
        # Important: Keep repetition_penalty at 1.0 for this early checkpoint
        repetition_penalty=1.0, 
    )