CycleCore-Technologies commited on
Commit
5adabe6
·
verified ·
1 Parent(s): 3a30092

Upload Maaza-SLM-360M-JSON-v1 - v1.0.0 production release

Browse files
README.md ADDED
@@ -0,0 +1,356 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CycleCore Maaza SLM-360M-JSON v1.0.0
2
+
3
+ Small Language Model (360M parameters) for high-accuracy JSON extraction on edge and server deployments.
4
+
5
+ ## Model Details
6
+
7
+ - **Developer**: CycleCore Technologies
8
+ - **Model Name**: CycleCore Maaza SLM-360M-JSON
9
+ - **Version**: v1.0.0
10
+ - **Base Model**: SmolLM2-360M (HuggingFaceTB)
11
+ - **Training Method**: LoRA fine-tuning (r=32, alpha=64)
12
+ - **Task**: Structured JSON extraction
13
+ - **License**: Apache 2.0
14
+ - **Parameters**: 360M total (379M), 17.4M trainable (4.58%)
15
+ - **Model Size**: ~720MB (FP16), ~180MB (Q4 quantized)
16
+ - **Context Length**: 4096 tokens
17
+
18
+ ## Intended Use
19
+
20
+ ### Primary Use Cases
21
+ - Production JSON extraction with high accuracy requirements
22
+ - Medium to complex schema extraction (4-12 fields, 1-2 nesting levels)
23
+ - API gateway response parsing and transformation
24
+ - Enterprise data integration pipelines
25
+ - Document processing workflows
26
+
27
+ ### Target Hardware
28
+ - **Server Deployment**: CPU or GPU, 16GB+ RAM
29
+ - **High-End Edge**: Laptop/workstation with 16GB+ RAM
30
+ - **Browser**: WebGPU (via ONNX Runtime)
31
+ - **Cloud**: Cost-effective alternative to API-based solutions
32
+
33
+ ### Out of Scope
34
+ - Open-ended conversation or creative writing
35
+ - Complex reasoning or multi-hop logic
36
+ - Math problem solving
37
+ - General-purpose chat applications
38
+
39
+ ## Benchmark Performance
40
+
41
+ ### EdgeJSON v3 Benchmark
42
+
43
+ Evaluated on 158 test cases across 24 schema types:
44
+
45
+ | Metric | Score |
46
+ |--------|-------|
47
+ | **JSONExact** | 55.1% |
48
+ | **Field F1** | 0.729 |
49
+ | **Schema Compliance** | 74.1% |
50
+ | **Latency (CPU)** | 17.2 tokens/sec |
51
+ | **Throughput** | 5.7 tokens/sec (estimated)|
52
+ | **Training Time** | 90.1 seconds |
53
+
54
+ ### By Complexity Level
55
+
56
+ | Complexity | Fields | Nesting | JSONExact | Field F1 |
57
+ |------------|--------|---------|-----------|----------|
58
+ | Simple | 2-4 | Flat | 78.9% | 0.927 |
59
+ | Medium | 4-8 | 1-2 levels | 51.4% | 0.815 |
60
+ | Complex | 8+ | 2+ levels | 4.0% | 0.072 |
61
+
62
+ ### Top Performing Schemas
63
+
64
+ **Perfect (100% JSONExact)**:
65
+ - `log_entry` (4 fields, simple)
66
+ - `product_info` (2 fields, simple)
67
+ - `sensor_reading` (4 fields, simple)
68
+ - `transaction_record` (5 fields, simple)
69
+
70
+ **High Accuracy (80%+)**:
71
+ - `notification` (88.9%)
72
+ - `simple_config` (87.5%)
73
+ - `support_ticket` (87.5%)
74
+ - `rating` (85.7%)
75
+ - `order_details` (83.3%)
76
+
77
+ ### Capacity Scaling Analysis
78
+
79
+ Comparison to MLM-135M demonstrates scaling effectiveness:
80
+
81
+ | Model | Params | JSONExact | Field F1 | Simple | Medium | Complex |
82
+ |-------|--------|-----------|----------|--------|--------|---------|
83
+ | MLM-135M | 135M | 24.7% | 0.520 | 44.7% | 13.5% | 0.0% |
84
+ | **SLM-360M** | 360M | **55.1%** | **0.729** | **78.9%** | **51.4%** | **4.0%** |
85
+ | **Improvement** | 2.67× | **2.23×** | **1.40×** | **1.77×** | **3.81×** | **∞** |
86
+
87
+ **Key Finding**: Complex schema ceiling breakthrough - 360M breaks the 0% barrier that 135M hit, proving capacity matters for structured tasks.
88
+
89
+ ### Training Efficiency
90
+
91
+ - **Base SmolLM2-360M**: 11.4% JSONExact (zero-shot)
92
+ - **Fine-tuned (this model)**: 55.1% JSONExact
93
+ - **Training Multiplier**: 4.83× improvement
94
+
95
+ **Training Multiplier Insight**: Larger models benefit less from fine-tuning (4.83×) vs smaller models (13× for 135M), suggesting better pre-training quality but diminishing fine-tuning returns.
96
+
97
+ ## Training Data
98
+
99
+ ### Dataset: EdgeJSON v3
100
+ - **Total Examples**: 787 (100% validated)
101
+ - **Train Split**: 629 examples (80%)
102
+ - **Test Split**: 158 examples (20%)
103
+ - **Validation Rate**: 100% (all examples pass schema validation)
104
+ - **Schema Count**: 24 unique schemas
105
+ - **Complexity Distribution**: 38 simple, 74 medium, 46 complex
106
+
107
+ ### Data Generation
108
+ - **Teacher Model**: Qwen2.5-7B-Instruct
109
+ - **Method**: Synthetic generation with validation
110
+ - **Quality Control**: 100% schema compliance, manual review sampling
111
+
112
+ ### Prompt Template
113
+ ```
114
+ Extract the structured JSON data from the following text.
115
+
116
+ Input: {prompt}
117
+
118
+ Output:
119
+ ```
120
+
121
+ ## Training Procedure
122
+
123
+ ### Hardware
124
+ - **GPU**: NVIDIA RTX 4080 SUPER (16GB)
125
+ - **Training Time**: 90.1 seconds
126
+ - **Effective Batch Size**: 32 (4 per device × 8 gradient accumulation)
127
+
128
+ ### Hyperparameters
129
+ - **Method**: LoRA (Low-Rank Adaptation)
130
+ - **LoRA Rank (r)**: 32 (2× larger than 135M)
131
+ - **LoRA Alpha**: 64 (2× larger than 135M)
132
+ - **LoRA Dropout**: 0.1
133
+ - **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
134
+ - **Learning Rate**: 1.5e-4 (slightly lower than 135M)
135
+ - **Optimizer**: AdamW (β1=0.9, β2=0.999, ε=1e-8)
136
+ - **Weight Decay**: 0.01
137
+ - **LR Scheduler**: Cosine with 10% warmup
138
+ - **Epochs**: 3
139
+ - **Precision**: BF16 mixed precision
140
+ - **Max Grad Norm**: 1.0
141
+
142
+ ### Training Loss
143
+ - **Final Training Loss**: 1.297 (better than 135M's 1.449)
144
+
145
+ ## Evaluation Methodology
146
+
147
+ ### Metrics
148
+
149
+ **JSONExact Score**:
150
+ - Binary exact match (0 or 1 per example)
151
+ - Compares predicted JSON to ground truth
152
+ - Requires perfect field matching
153
+
154
+ **Field F1**:
155
+ - Per-field precision and recall
156
+ - Averaged across all fields
157
+ - Partial credit for correct fields
158
+
159
+ **Schema Compliance**:
160
+ - Validates against JSON schema specification
161
+ - Checks required fields, types, structure
162
+
163
+ ### Inference Settings
164
+ - **Temperature**: 0.0 (deterministic)
165
+ - **Max Tokens**: 512
166
+ - **Format**: JSON mode enforced
167
+ - **Platform**: CUDA (GPU) or CPU
168
+
169
+ ## Limitations and Bias
170
+
171
+ ### Known Limitations
172
+
173
+ **Complex Schema Ceiling**: While this model breaks through the 0% ceiling that MLM-135M hit on complex schemas, it still achieves only 4.0% exact match on 8+ field schemas with 2+ nesting levels. For production complex schema extraction, consider larger models (>500M params) or specialized architectures.
174
+
175
+ **Medium Schema Viability**: Best suited for simple (78.9%) and medium (51.4%) schemas. Medium schema performance is production-viable but may require validation/correction workflows.
176
+
177
+ **Synthetic Data**: Trained exclusively on synthetically generated data from Qwen2.5-7B, which may not capture all real-world edge cases.
178
+
179
+ **Latency Trade-off**: 2.67× larger than MLM-135M but similar CPU inference speed (17.2 vs 18.5 tok/sec), making it an excellent value-for-accuracy trade-off.
180
+
181
+ ### Potential Biases
182
+ - Inherits biases from teacher model (Qwen2.5-7B)
183
+ - Synthetic data may not reflect real-world data distributions
184
+ - Performance varies significantly by schema complexity (simple vs complex)
185
+
186
+ ### Ethical Considerations
187
+ - **Privacy**: On-device deployment avoids cloud API calls, keeping data local
188
+ - **Energy**: Fast training (90.1s) and efficient inference reduce carbon footprint
189
+ - **Transparency**: 100% open training methodology, reproducible results
190
+ - **Accessibility**: Apache 2.0 license enables free commercial use
191
+
192
+ ## How to Use
193
+
194
+ ### Installation
195
+
196
+ ```bash
197
+ pip install transformers peft torch
198
+ ```
199
+
200
+ ### Loading the Model
201
+
202
+ ```python
203
+ from transformers import AutoTokenizer, AutoModelForCausalLM
204
+ from peft import PeftModel
205
+
206
+ # Load base model
207
+ base_model = AutoModelForCausalLM.from_pretrained(
208
+ "HuggingFaceTB/SmolLM2-360M",
209
+ torch_dtype=torch.float16,
210
+ device_map="auto"
211
+ )
212
+
213
+ # Load LoRA adapter
214
+ model = PeftModel.from_pretrained(
215
+ base_model,
216
+ "CycleCore/Maaza-SLM-360M-JSON-v1"
217
+ )
218
+
219
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-360M")
220
+ ```
221
+
222
+ ### Inference Example (Medium Complexity)
223
+
224
+ ```python
225
+ prompt = """Extract the structured JSON data from the following text.
226
+
227
+ Input: Order #12345 placed by Jane Smith (jane@example.com) on 2025-11-20.
228
+ Items: 2x Widget ($19.99 each), 1x Gadget ($49.99).
229
+ Shipping to 123 Main St, Springfield, IL 62701. Total: $89.97.
230
+
231
+ Output:"""
232
+
233
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
234
+ outputs = model.generate(
235
+ **inputs,
236
+ max_new_tokens=512,
237
+ temperature=0.0,
238
+ do_sample=False
239
+ )
240
+
241
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
242
+ print(result)
243
+ ```
244
+
245
+ ### Expected Output
246
+
247
+ ```json
248
+ {
249
+ "order_id": "12345",
250
+ "customer": {
251
+ "name": "Jane Smith",
252
+ "email": "jane@example.com"
253
+ },
254
+ "order_date": "2025-11-20",
255
+ "items": [
256
+ {"name": "Widget", "quantity": 2, "price": 19.99},
257
+ {"name": "Gadget", "quantity": 1, "price": 49.99}
258
+ ],
259
+ "shipping_address": {
260
+ "street": "123 Main St",
261
+ "city": "Springfield",
262
+ "state": "IL",
263
+ "zip": "62701"
264
+ },
265
+ "total": 89.97
266
+ }
267
+ ```
268
+
269
+ ## Model Comparison
270
+
271
+ For guidance on choosing between MLM-135M and SLM-360M, see our [Model Comparison Guide](https://github.com/CycleCore/SLMBench/blob/main/docs/MODEL_COMPARISON.md).
272
+
273
+ **Quick Decision**:
274
+ - **Use SLM-360M** if: Higher accuracy required (55%+), medium schemas (4-8 fields), production deployments, accuracy > latency priority
275
+ - **Use MLM-135M** if: Ultra-low latency required, simple schemas only (2-4 fields), extreme resource constraints (<500MB)
276
+
277
+ **Performance Summary**:
278
+ | Criterion | MLM-135M | SLM-360M |
279
+ |-----------|----------|----------|
280
+ | JSONExact | 24.7% | 55.1% (2.23× better) |
281
+ | Simple Schemas | 44.7% | 78.9% (1.77× better) |
282
+ | Medium Schemas | 13.5% | 51.4% (3.81× better) |
283
+ | Complex Schemas | 0.0% | 4.0% (breakthrough) |
284
+ | Model Size | ~270MB | ~720MB |
285
+ | Latency (CPU) | 18.5 tok/s | 17.2 tok/s |
286
+
287
+ ## Citation
288
+
289
+ If you use this model in your research, please cite:
290
+
291
+ ```bibtex
292
+ @misc{cyclecore2025slm,
293
+ title={CycleCore Maaza SLM-360M-JSON: Small Language Model for Edge JSON Extraction},
294
+ author={CycleCore Technologies},
295
+ year={2025},
296
+ publisher={HuggingFace},
297
+ howpublished={\url{https://huggingface.co/CycleCore/Maaza-SLM-360M-JSON-v1}},
298
+ }
299
+ ```
300
+
301
+ **Academic Paper** (forthcoming):
302
+ ```bibtex
303
+ @article{cyclecore2025slmbench,
304
+ title={Capacity Scaling in Micro and Small Language Models: Evidence from EdgeJSON Benchmark},
305
+ author={CycleCore Technologies},
306
+ journal={arXiv preprint},
307
+ year={2025},
308
+ note={Paper in preparation}
309
+ }
310
+ ```
311
+
312
+ ## Links
313
+
314
+ - **Model Repository**: https://huggingface.co/CycleCore/Maaza-SLM-360M-JSON-v1
315
+ - **Base Model**: https://huggingface.co/HuggingFaceTB/SmolLM2-360M
316
+ - **Companion Model**: https://huggingface.co/CycleCore/Maaza-MLM-135M-JSON-v1
317
+ - **SLMBench Benchmark**: https://github.com/CycleCore/SLMBench
318
+ - **Documentation**: https://github.com/CycleCore/SLMBench/tree/main/docs
319
+ - **Capacity Scaling Analysis**: https://github.com/CycleCore/SLMBench/blob/main/results/CAPACITY_SCALING_ANALYSIS.md
320
+ - **Paper**: Coming soon (arXiv)
321
+ - **Website**: slmbench.com (coming soon)
322
+
323
+ ## Version History
324
+
325
+ ### v1.0.0 (2025-11-20)
326
+ - Initial release
327
+ - Trained on EdgeJSON v3 dataset (100% validated)
328
+ - 55.1% JSONExact, 0.729 Field F1
329
+ - LoRA fine-tuning (r=32, alpha=64)
330
+ - 90.1 second training time
331
+ - Breakthrough: 4.0% on complex schemas (vs 0% for 135M)
332
+ - Apache 2.0 license
333
+
334
+ ## Contact
335
+
336
+ For questions, issues, or collaboration:
337
+ - **GitHub Issues**: https://github.com/CycleCore/SLMBench/issues
338
+ - **Email**: contact@cyclecore.tech (coming soon)
339
+
340
+ ## License
341
+
342
+ Apache License 2.0
343
+
344
+ Copyright 2025 CycleCore Technologies
345
+
346
+ Licensed under the Apache License, Version 2.0 (the "License");
347
+ you may not use this file except in compliance with the License.
348
+ You may obtain a copy of the License at
349
+
350
+ http://www.apache.org/licenses/LICENSE-2.0
351
+
352
+ Unless required by applicable law or agreed to in writing, software
353
+ distributed under the License is distributed on an "AS IS" BASIS,
354
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
355
+ See the License for the specific language governing permissions and
356
+ limitations under the License.
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "HuggingFaceTB/SmolLM2-360M",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "k_proj",
33
+ "v_proj",
34
+ "q_proj",
35
+ "o_proj",
36
+ "down_proj",
37
+ "up_proj",
38
+ "gate_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77fd825b1644c81ff5dfd2f623461f3ba1da60e200f39630449cc8c4618eb522
3
+ size 69527352
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|im_start|>",
5
+ "<|im_end|>",
6
+ "<repo_name>",
7
+ "<reponame>",
8
+ "<file_sep>",
9
+ "<filename>",
10
+ "<gh_stars>",
11
+ "<issue_start>",
12
+ "<issue_comment>",
13
+ "<issue_closed>",
14
+ "<jupyter_start>",
15
+ "<jupyter_text>",
16
+ "<jupyter_code>",
17
+ "<jupyter_output>",
18
+ "<jupyter_script>",
19
+ "<empty_output>"
20
+ ],
21
+ "bos_token": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "eos_token": {
29
+ "content": "<|endoftext|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "pad_token": "<|endoftext|>",
36
+ "unk_token": {
37
+ "content": "<|endoftext|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false
42
+ }
43
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|endoftext|>",
143
+ "<|im_start|>",
144
+ "<|im_end|>",
145
+ "<repo_name>",
146
+ "<reponame>",
147
+ "<file_sep>",
148
+ "<filename>",
149
+ "<gh_stars>",
150
+ "<issue_start>",
151
+ "<issue_comment>",
152
+ "<issue_closed>",
153
+ "<jupyter_start>",
154
+ "<jupyter_text>",
155
+ "<jupyter_code>",
156
+ "<jupyter_output>",
157
+ "<jupyter_script>",
158
+ "<empty_output>"
159
+ ],
160
+ "bos_token": "<|endoftext|>",
161
+ "clean_up_tokenization_spaces": false,
162
+ "eos_token": "<|endoftext|>",
163
+ "extra_special_tokens": {},
164
+ "model_max_length": 8192,
165
+ "pad_token": "<|endoftext|>",
166
+ "tokenizer_class": "GPT2Tokenizer",
167
+ "unk_token": "<|endoftext|>",
168
+ "vocab_size": 49152
169
+ }
training_metadata.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "CycleCore-Maaza-SLM-360M-JSON",
3
+ "base_model": "HuggingFaceTB/SmolLM2-360M",
4
+ "training_date": "2025-11-20 13:07:28",
5
+ "num_epochs": 3,
6
+ "learning_rate": 0.00015,
7
+ "batch_size": 32,
8
+ "train_examples": 629,
9
+ "validation_examples": 0,
10
+ "test_examples": 158,
11
+ "lora_config": {
12
+ "enabled": true,
13
+ "r": 32,
14
+ "lora_alpha": 64,
15
+ "lora_dropout": 0.1,
16
+ "target_modules": [
17
+ "q_proj",
18
+ "v_proj",
19
+ "k_proj",
20
+ "o_proj",
21
+ "gate_proj",
22
+ "up_proj",
23
+ "down_proj"
24
+ ],
25
+ "bias": "none",
26
+ "task_type": "CAUSAL_LM"
27
+ },
28
+ "validation_run": false
29
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff