NPC Fin 32B SFT β€” Financial Reasoning LLM

A domain-specific financial reasoning model fine-tuned from Qwen2.5-32B-Instruct using QLoRA, focused on crypto market analysis, macro reasoning, and multi-step financial logic.

πŸ“„ Paper: NPC Fin 32B: A Domain-Specialized Financial Reasoning Model via Multi-GPU QLoRA (Zenodo, 2026)

Model Details

Parameter Value
Base Model Qwen/Qwen2.5-32B-Instruct
Method QLoRA (4-bit NF4 quantization)
LoRA Rank 64
LoRA Alpha 128
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Framework Unsloth + trl SFTTrainer
Max Sequence Length 4,096 tokens

Training Data

  • 32,496 SFT examples (59.7M tokens)
  • 5 domain-specific tags:
    • crypto_signal β€” real-time market signal analysis and trade reasoning
    • crypto_general β€” broad crypto ecosystem knowledge
    • logic_tree β€” multi-path reasoning with correct and incorrect branches
    • stocks_macro β€” equities and macroeconomic analysis
    • cross_market β€” cross-asset correlation and regime detection
  • Synthetic data generated via HF Inference API (Qwen2.5-72B-Instruct) at zero incremental cost
  • Source signals exported from production MongoDB (btunified database)
  • MinHash deduplication applied, quality-filtered with automated scoring

Training Configuration

Parameter Value
Optimizer AdamW 8-bit
Learning Rate 2e-4
LR Schedule Cosine decay
Warmup Ratio 0.05
Weight Decay 0.01
Per-device Batch Size 4
Gradient Accumulation 8
Realized Effective Batch ~384 (4 Γ— 12 GPUs Γ— 8)
Epochs 3
Mixed Precision bf16
Distributed Strategy DeepSpeed ZeRO-3 + full CPU offload
Hardware 12 Γ— NVIDIA H100 SXM5 80GB (RunPod single multi-GPU node)
Wall Clock ~72 hours (3 days)
Total Compute ~864 H100-hours

Note on batch size: an earlier version of this card listed the per-device batch (4) and gradient accumulation (8) with an "effective batch 32" annotation inherited from a single-GPU experimental plan. The realized run distributed across 12 H100 GPUs under DeepSpeed ZeRO-3 scaled the effective batch by world size to approximately 384 (4 Γ— 12 Γ— 8). The peak learning rate of 2e-4 was tuned for the planned eff-batch 32, not for the realized 384; see the paper Β§4.3 (Config drift: planned vs realized batch size) for full discussion.

Evaluation

Benchmark Score
CryptoQA (custom, 500 questions) 93.6%

CryptoQA covers: token fundamentals, DeFi mechanics, on-chain analytics interpretation, market regime identification, and risk assessment.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-32B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "ramankrishna10/npc-fin-32b-sft")

messages = [
    {"role": "system", "content": "You are a financial reasoning assistant."},
    {"role": "user", "content": "Analyze the risk/reward of entering a long ETH position given declining on-chain activity but increasing institutional inflows."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

  • Financial market analysis and reasoning
  • Crypto signal interpretation and trade logic
  • Multi-step reasoning over market scenarios
  • Research and educational purposes

Limitations

  • This is the SFT base model only β€” it does not include tool-use or identity fine-tuning
  • Trained primarily on crypto/DeFi data; performance on traditional equities may be lower
  • Not intended as financial advice β€” outputs are AI-generated analysis
  • Single-GPU training (A40) β€” not trained at cluster scale
  • May hallucinate token names or market data not present in training set

Related Models

  • npc-fin-prm-7b β€” Process Reward Model for step-level reasoning verification

Citation

Bachu, R. K. (2026). NPC Fin 32B: A Domain-Specialized Financial Reasoning Model via Multi-GPU QLoRA. Zenodo. https://doi.org/10.5281/zenodo.19802598

@misc{bachu2026npcfin32b,
  title     = {NPC Fin 32B: A Domain-Specialized Financial Reasoning Model
               via Multi-GPU QLoRA},
  author    = {Bachu, Rama Krishna},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19802598},
  url       = {https://doi.org/10.5281/zenodo.19802598},
  note      = {Preprint},
}

Author

Ramakrishna Bachu β€” GitHub | LinkedIn

Part of the NPC Model Family by Bottensor.

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ramankrishna10/npc-fin-32b-sft

Base model

Qwen/Qwen2.5-32B
Adapter
(118)
this model