Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
khazarai
's Collections
Distilled Models
Benchmarks & Datasets
CoT
Az-Language
GRPO
Text-to-Speech Models
RLHF
SFT
GRPO
updated
2 days ago
Group Relative Policy Optimization
Upvote
1
khazarai/HeisenbergQ-0.5B-RL
Text Generation
•
Updated
Sep 25, 2025
•
2
•
1
khazarai/Math-RL
Text Generation
•
0.5B
•
Updated
2 days ago
•
172
•
1
Upvote
1
Share collection
View history
Collection guide
Browse collections