open-r1
/

OpenR1-Qwen-7B

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

OpenR1-Qwen-7B / README.md

lewtun's picture

lewtun HF Staff

Update evals with proper pass@1 scores (#8)

dc7a2d9 verified 11 months ago

|

history blame contribute delete

2.47 kB

	---
	datasets: open-r1/openr1-220k-math
	library_name: transformers
	model_name: OpenR1-Qwen-7B
	tags:
	- generated_from_trainer
	- trl
	- sft
	licence: license
	license: apache-2.0
	---

	# OpenR1-Qwen-7B

	This is a finetune of [Qwen2.5-Math-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) on [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) (`default` split).

	> [!NOTE]
	> Check out [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) for an improved model that was trained on [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts) and replicates the performance of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) across multiple reasoning domains.

	## Quick start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "open-r1/OpenR1-Qwen-7B"
	device = "cuda"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

	messages = [
	{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
	{"role": "user", "content": prompt}
	]
	```

	## Training

	We train the model on the `default` split of [OpenR1-220k-Math](https://huggingface.co/datasets/open-r1/openr1-220k-math) for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) and [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B) using [lighteval](https://github.com/huggingface/open-r1/tree/main?tab=readme-ov-file#evaluating-models).

	You can find the training and evaluation code at: https://github.com/huggingface/open-r1/

	\| Model \| MATH-500 \| AIME 2024 \| AIME 2025 \| GPQA-D \|
	\|--------------------------\|----------\|-----------\|-----------\|--------\|
	\| DeepSeek-Distill-Qwen-7B \| 93.5 \| 51.3 \| 35.8 \| 52.4 \|
	\| OpenR1-Qwen-7B \| 90.6 \| 47.0 \| 33.2 \| 42.4 \|
	\| OpenThinker-7B \| 86.4 \| 31.3 \| 24.6 \| 39.1 \|