Junrulu commited on
Commit
edfce07
·
verified ·
1 Parent(s): 5690998

align with transformers merging (#19)

Browse files

- align with transformers merging (26ee82e31d75b2a6cb77fe50bef93e60e96c3b32)

Files changed (1) hide show
  1. README.md +95 -3
README.md CHANGED
@@ -38,8 +38,9 @@ base_model:
38
  | Youtu-LLM-2B-GGUF | Instruct model of Youtu-LLM-2B, in GGUF format | 🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B-GGUF)|
39
 
40
  ## 📰 News
41
- - [2026.01.07] You can now fine-tuning Youtu-LLM with [ModelScope](https://mp.weixin.qq.com/s/JJtQWSYEjnE7GnPkaJ7UNA).
42
- - [2026.01.04] You can now fine-tuning Youtu-LLM with [LlamaFactory](https://github.com/hiyouga/LlamaFactory/pull/9707).
 
43
 
44
  <a id="benchmarks"></a>
45
 
@@ -89,8 +90,12 @@ base_model:
89
  ## 🚀 Quick Start
90
  This guide will help you quickly deploy and invoke the **Youtu-LLM-2B** model. This model supports "Reasoning Mode", enabling it to generate higher-quality responses through Chain of Thought (CoT).
91
 
92
- ### 1. Environment Preparation
 
 
 
93
 
 
94
  Ensure your Python environment has the `transformers` library installed and that the version meets the requirements.
95
 
96
  ```bash
@@ -169,6 +174,93 @@ thought, final_answer = parse_reasoning(full_response)
169
  print(f"\n{'='*20} Thought Process {'='*20}\n{thought}")
170
  print(f"\n{'='*20} Final Answer {'='*20}\n{final_answer}")
171
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
  ### 3. Key Configuration Details
174
 
 
38
  | Youtu-LLM-2B-GGUF | Instruct model of Youtu-LLM-2B, in GGUF format | 🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B-GGUF)|
39
 
40
  ## 📰 News
41
+ - [2026.01.28] You can now directly use Youtu-LLM with [Transformers](https://github.com/huggingface/transformers/pull/43166).
42
+ - [2026.01.07] You can now fine-tune Youtu-LLM with [ModelScope](https://mp.weixin.qq.com/s/JJtQWSYEjnE7GnPkaJ7UNA).
43
+ - [2026.01.04] You can now fine-tune Youtu-LLM with [LlamaFactory](https://github.com/hiyouga/LlamaFactory/pull/9707).
44
 
45
  <a id="benchmarks"></a>
46
 
 
90
  ## 🚀 Quick Start
91
  This guide will help you quickly deploy and invoke the **Youtu-LLM-2B** model. This model supports "Reasoning Mode", enabling it to generate higher-quality responses through Chain of Thought (CoT).
92
 
93
+ <details>
94
+ <summary>Transformers below 5.0.0.dev0</summary>
95
+
96
+ If you wish to use Youtu-LLM-2B based on earlier versions of transformers, please make sure to download the model repository before this [commit](https://huggingface.co/tencent/Youtu-LLM-2B/commit/5690998a0a4cae7a7ec970d09262745e00bb6c5c).
97
 
98
+ ### 1. Environment Preparation
99
  Ensure your Python environment has the `transformers` library installed and that the version meets the requirements.
100
 
101
  ```bash
 
174
  print(f"\n{'='*20} Thought Process {'='*20}\n{thought}")
175
  print(f"\n{'='*20} Final Answer {'='*20}\n{final_answer}")
176
  ```
177
+ </details>
178
+
179
+ <details>
180
+ <summary>Transformers equals or higher than 5.0.0.dev0</summary>
181
+
182
+ ### 1. Environment Preparation
183
+ Ensure your Python environment has the `transformers` library installed and that the version meets the requirements.
184
+
185
+ ```bash
186
+ git clone https://github.com/huggingface/transformers.git
187
+ cd transformers
188
+
189
+ # pip
190
+ pip install '.[torch]'
191
+
192
+ # uv
193
+ uv pip install '.[torch]'
194
+
195
+ ```
196
+
197
+ ### 2. Core Code Example
198
+
199
+ The following example demonstrates how to load the model, enable Reasoning Mode, and use the `re` module to parse the "Thought Process" and the "Final Answer" from the output.
200
+
201
+ ```python
202
+ import re
203
+ from transformers import AutoTokenizer, AutoModelForCausalLM
204
+
205
+ # 1. Configure Model
206
+ model_id = "tencent/Youtu-LLM-2B"
207
+
208
+ # 2. Initialize Tokenizer and Model
209
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
210
+ model = AutoModelForCausalLM.from_pretrained(
211
+ model_id,
212
+ device_map="auto"
213
+ )
214
+
215
+ # 3. Construct Dialogue Input
216
+ prompt = "Hello"
217
+ messages = [{"role": "user", "content": prompt}]
218
+
219
+ # Use apply_chat_template to construct input; set enable_thinking=True to activate Reasoning Mode
220
+ input_text = tokenizer.apply_chat_template(
221
+ messages,
222
+ tokenize=False,
223
+ add_generation_prompt=True,
224
+ enable_thinking=True
225
+ )
226
+
227
+ model_inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
228
+ print("Input prepared. Starting generation...")
229
+
230
+ # 4. Generate Response
231
+ outputs = model.generate(
232
+ **model_inputs,
233
+ max_new_tokens=512,
234
+ do_sample=True,
235
+ temperature=1.0,
236
+ top_k=20,
237
+ top_p=0.95,
238
+ repetition_penalty=1.05
239
+ )
240
+ print("Generation complete!")
241
+
242
+ # 5. Parse Results
243
+ full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
244
+
245
+ def parse_reasoning(text):
246
+ """Extract thought process within <think> tags and the subsequent answer content"""
247
+ thought_pattern = r"<think>(.*?)</think>"
248
+ match = re.search(thought_pattern, text, re.DOTALL)
249
+
250
+ if match:
251
+ thought = match.group(1).strip()
252
+ answer = text.split("</think>")[-1].strip()
253
+ else:
254
+ thought = "(No explicit thought process generated)"
255
+ answer = text
256
+ return thought, answer
257
+
258
+ thought, final_answer = parse_reasoning(full_response)
259
+
260
+ print(f"\n{'='*20} Thought Process {'='*20}\n{thought}")
261
+ print(f"\n{'='*20} Final Answer {'='*20}\n{final_answer}")
262
+ ```
263
+ </details>
264
 
265
  ### 3. Key Configuration Details
266