YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
train_batch_size=64
ppo_mini_batch_size=32
rollout_n=8
learning_rate=1e-6
kl_loss_coef=0.001
entropy_coeff=0.001
temperature=0.8
clip_ratio_low=0.2
clip_ratio_high=0.26
epoch=10
# below is settings for Dr.GRPO
loss_agg_mode="seq-mean-token-sum-norm"
use_kl_loss=False
norm_adv_by_std_in_grpo=False
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support