HuggingFaceH4/ultrafeedback_binarized
Viewer • Updated • 187k • 16.6k • 338
How to use Amu/r-zephyr-7b-beta-qlora with PEFT:
Task type is invalid.
The 'r' means replicate. This model is a model replicated by using https://github.com/huggingface/alignment-handbook.
This model is a fine-tuned version on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5917 | 0.21 | 100 | 0.5950 | -0.3904 | -0.7775 | 0.7109 | 0.3872 | -326.0618 | -286.5451 | -1.9790 | -1.9769 |
| 0.5281 | 0.42 | 200 | 0.5492 | -0.8657 | -1.6137 | 0.7617 | 0.7479 | -409.6739 | -334.0814 | -0.2289 | -0.2367 |
| 0.5321 | 0.63 | 300 | 0.5321 | -0.7444 | -1.4427 | 0.7734 | 0.6983 | -392.5731 | -321.9463 | 0.3829 | 0.3741 |
| 0.5149 | 0.84 | 400 | 0.5233 | -0.9570 | -1.7432 | 0.7617 | 0.7862 | -422.6298 | -343.2071 | 0.6479 | 0.6688 |
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 62.70 |
| AI2 Reasoning Challenge (25-Shot) | 63.05 |
| HellaSwag (10-Shot) | 85.38 |
| MMLU (5-Shot) | 63.10 |
| TruthfulQA (0-shot) | 46.32 |
| Winogrande (5-shot) | 79.32 |
| GSM8k (5-shot) | 39.04 |
Base model
mistralai/Mistral-7B-v0.1