NPC Fin 32B SFT β Financial Reasoning LLM
A domain-specific financial reasoning model fine-tuned from Qwen2.5-32B-Instruct using QLoRA, focused on crypto market analysis, macro reasoning, and multi-step financial logic.
π Paper: NPC Fin 32B: A Domain-Specialized Financial Reasoning Model via Multi-GPU QLoRA (Zenodo, 2026)
Model Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-32B-Instruct |
| Method | QLoRA (4-bit NF4 quantization) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training Framework | Unsloth + trl SFTTrainer |
| Max Sequence Length | 4,096 tokens |
Training Data
- 32,496 SFT examples (59.7M tokens)
- 5 domain-specific tags:
crypto_signalβ real-time market signal analysis and trade reasoningcrypto_generalβ broad crypto ecosystem knowledgelogic_treeβ multi-path reasoning with correct and incorrect branchesstocks_macroβ equities and macroeconomic analysiscross_marketβ cross-asset correlation and regime detection
- Synthetic data generated via HF Inference API (Qwen2.5-72B-Instruct) at zero incremental cost
- Source signals exported from production MongoDB (btunified database)
- MinHash deduplication applied, quality-filtered with automated scoring
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW 8-bit |
| Learning Rate | 2e-4 |
| LR Schedule | Cosine decay |
| Warmup Ratio | 0.05 |
| Weight Decay | 0.01 |
| Per-device Batch Size | 4 |
| Gradient Accumulation | 8 |
| Realized Effective Batch | ~384 (4 Γ 12 GPUs Γ 8) |
| Epochs | 3 |
| Mixed Precision | bf16 |
| Distributed Strategy | DeepSpeed ZeRO-3 + full CPU offload |
| Hardware | 12 Γ NVIDIA H100 SXM5 80GB (RunPod single multi-GPU node) |
| Wall Clock | ~72 hours (3 days) |
| Total Compute | ~864 H100-hours |
Note on batch size: an earlier version of this card listed the per-device batch (4) and gradient accumulation (8) with an "effective batch 32" annotation inherited from a single-GPU experimental plan. The realized run distributed across 12 H100 GPUs under DeepSpeed ZeRO-3 scaled the effective batch by world size to approximately 384 (4 Γ 12 Γ 8). The peak learning rate of 2e-4 was tuned for the planned eff-batch 32, not for the realized 384; see the paper Β§4.3 (Config drift: planned vs realized batch size) for full discussion.
Evaluation
| Benchmark | Score |
|---|---|
| CryptoQA (custom, 500 questions) | 93.6% |
CryptoQA covers: token fundamentals, DeFi mechanics, on-chain analytics interpretation, market regime identification, and risk assessment.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-32B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "ramankrishna10/npc-fin-32b-sft")
messages = [
{"role": "system", "content": "You are a financial reasoning assistant."},
{"role": "user", "content": "Analyze the risk/reward of entering a long ETH position given declining on-chain activity but increasing institutional inflows."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use
- Financial market analysis and reasoning
- Crypto signal interpretation and trade logic
- Multi-step reasoning over market scenarios
- Research and educational purposes
Limitations
- This is the SFT base model only β it does not include tool-use or identity fine-tuning
- Trained primarily on crypto/DeFi data; performance on traditional equities may be lower
- Not intended as financial advice β outputs are AI-generated analysis
- Single-GPU training (A40) β not trained at cluster scale
- May hallucinate token names or market data not present in training set
Related Models
- npc-fin-prm-7b β Process Reward Model for step-level reasoning verification
Citation
Bachu, R. K. (2026). NPC Fin 32B: A Domain-Specialized Financial Reasoning Model via Multi-GPU QLoRA. Zenodo. https://doi.org/10.5281/zenodo.19802598
@misc{bachu2026npcfin32b,
title = {NPC Fin 32B: A Domain-Specialized Financial Reasoning Model
via Multi-GPU QLoRA},
author = {Bachu, Rama Krishna},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19802598},
url = {https://doi.org/10.5281/zenodo.19802598},
note = {Preprint},
}
Author
Ramakrishna Bachu β GitHub | LinkedIn
Part of the NPC Model Family by Bottensor.
- Downloads last month
- 30