NeuroBLAST-V3 0.6B SYNTH EC 144B TOKENS
This is an Early Checkpoint (EC) of the NeuroBLAST V3 architecture, a novel hybrid model designed with a biologically inspired "cortical" structure.
This specific checkpoint represents the "decay" phase of training. It has been trained on longer contexts with a lower learning rate than the previous one and is intended for architectural evaluation and research purposes.
Model Details
- Architecture: NeuroBLAST V3 (Custom Hybrid Architecture)
- Checkpoint Step: 40,000
- Parameters: 596,728,320
- Num layers: 72
- Sensory layers: 24
- Associative layers: 32
- Motor layers: 16
- Hidden size: 512
- Vocab size: 65538
- Intermediate size: 3072
- Num attention heads: 16
- Num kv heads: 8
- Head dim: 128
- Tie word embeddings: False
Architecture Highlights
NeuroBLAST differs from standard Transformers by utilizing a three-stage cortical design:
- Sensory Cortex: Hybrid layers alternating between Attention and Dilated Causal 2D Convolutions.
- Associative Cortex: Hybrid layers with alternating RoPE usage.
- Motor Cortex: Pure Attention layers.
- Deep Residual Bridges: Long-range residual connections injecting the original embeddings (and their negations) between cortical stages to improve signal propagation.
Training Details
This model is currently being trained using the Google TPU Research Cloud (TRC).
- Dataset: PleIAs/SYNTH
- Tokens Processed: ~144 Billion
- Hardware: TPUv4-16
- Training Time: ~13 Days
- Effective Batch Size: 1024
- Context Length: 2048 tokens (Current phase)
- Learning rate: 2e-4
- Weight decay: 0.01
- Optimizer: AdamW
- Precision: BFloat16
- Current State: Decay phase
Usage
Note: You must use trust_remote_code=True as this model utilizes custom modeling code (modeling_neuroblast.py).
import torch
from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
model_id = "mkurman/NeuroBLAST-V3-0.6M-SYNTH-EC-144B-TOK"
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load the model with custom code trust
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map='cuda',
trust_remote_code=True
).eval()
streamer = TextStreamer(
tokenizer, skip_prompt=False, decode_kwargs={"skip_special_tokens": False}
)
# Prepare input
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": "what is hypertension?"}],
tokenize=True,
return_tensors="pt",
add_generation_prompt=True
)
print(f"Input IDs: {input_ids}")
# Generate
with torch.no_grad():
outputs = model.generate(
input_ids=input_ids.to(model.device),
max_new_tokens=128,
streamer=streamer,
use_cache=True,
# Important: Keep repetition_penalty at 1.0 for this early checkpoint
repetition_penalty=1.0,
)
You can also find support for vLLM in my GitHub repository.
Acknowledgments
This model was trained using Cloud TPUs provided by Google's TPU Research Cloud (TRC) program.
Special thanks to Pierre-Carl Langlais and the PleIAs team for the high-quality SYNTH dataset.
Repo
- Downloads last month
- 8
