CIRCL
/

vulnerability-severity-classification-chinese-macbert-base

@@ -1,53 +1,47 @@
 ---
-base_model: hfl/chinese-macbert-base
-datasets:
-- CIRCL/Vulnerability-CNVD
 library_name: transformers
 license: apache-2.0
-metrics:
-- accuracy
 tags:
 - generated_from_trainer
-- text-classification
-- classification
-- nlp
-- chinese
-- vulnerability
-pipeline_tag: text-classification
-language: zh
 model-index:
 - name: vulnerability-severity-classification-chinese-macbert-base
   results: []
 ---
-# VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Chinese Text)
-🇨🇳 This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on the dataset [CIRCL/Vulnerability-CNVD](https://huggingface.co/datasets/CIRCL/Vulnerability-CNVD). 🇨🇳
-For more information, visit the [Vulnerability-Lookup project page](https://vulnerability.circl.lu) or the [ML-Gateway GitHub repository](https://github.com/vulnerability-lookup/ML-Gateway), which demonstrates its usage in a FastAPI server.
-## How to use
-You can use this model directly with the Hugging Face `transformers` library for text classification:
-```python
-from transformers import pipeline
-classifier = pipeline(
-    "text-classification",
-    model="CIRCL/vulnerability-severity-classification-chinese-macbert-base"
-)
-# Example usage for a Chinese vulnerability description
-description_chinese = "TOTOLINK A3600R是中国吉翁电子（TOTOLINK）公司的一款6天线1200M无线路由器。TOTOLINK A3600R存在缓冲区溢出漏洞，该漏洞源于/cgi-bin/cstecgi.cgi文件的UploadCustomModule函数中的File参数未能正确验证输入数据的长度大小，攻击者可利用该漏洞在系统上执行任意代码或者导致拒绝服务。"
-result_chinese = classifier(description_chinese)
-print(result_chinese)
-# Expected output example: [{'label': '高', 'score': 0.9802}]
-```
 ## Training procedure
@@ -55,31 +49,27 @@ print(result_chinese)
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - num_epochs: 5
-It achieves the following results on the evaluation set:
-- Loss: 1.2224
-- Accuracy: 0.7783
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Accuracy |
-|:-------------:|:-----:|:-----:|:---------------:|:--------:|
-| 1.2400        | 1.0   | 3588  | 1.1658          | 0.7567   |
-| 1.1318        | 2.0   | 7176  | 1.1025          | 0.7711   |
-| 1.0106        | 3.0   | 10764 | 1.0848          | 0.7829   |
-| 0.6185        | 4.0   | 14352 | 1.1507          | 0.7807   |
-| 0.6463        | 5.0   | 17940 | 1.2224          | 0.7783   |
 ### Framework versions
-- Transformers 5.3.0
-- Pytorch 2.10.0+cu128
-- Datasets 4.8.3
 - Tokenizers 0.22.2

 ---
 library_name: transformers
 license: apache-2.0
+base_model: hfl/chinese-macbert-base
 tags:
 - generated_from_trainer
+metrics:
+- accuracy
 model-index:
 - name: vulnerability-severity-classification-chinese-macbert-base
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# vulnerability-severity-classification-chinese-macbert-base
+This model is a fine-tuned version of [hfl/chinese-macbert-base](https://huggingface.co/hfl/chinese-macbert-base) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 2.5405
+- Accuracy: 0.7661
+- F1 Macro: 0.6864
+- Low Precision: 0.5879
+- Low Recall: 0.4169
+- Low F1: 0.4879
+- Medium Precision: 0.7843
+- Medium Recall: 0.8171
+- Medium F1: 0.8004
+- High Precision: 0.7680
+- High Recall: 0.7737
+- High F1: 0.7709
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
 ## Training procedure
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
+- train_batch_size: 64
+- eval_batch_size: 64
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - num_epochs: 5
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Macro | Low Precision | Low Recall | Low F1 | Medium Precision | Medium Recall | Medium F1 | High Precision | High Recall | High F1 |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:-------------:|:----------:|:------:|:----------------:|:-------------:|:---------:|:--------------:|:-----------:|:-------:|
+| 2.2708        | 1.0   | 1590 | 2.3968          | 0.7482   | 0.6143   | 0.6555        | 0.1967     | 0.3026 | 0.7461           | 0.8416        | 0.7910    | 0.7589         | 0.7398      | 0.7493  |
+| 2.3716        | 2.0   | 3180 | 2.2675          | 0.7627   | 0.6657   | 0.5966        | 0.3380     | 0.4315 | 0.7648           | 0.8413        | 0.8012    | 0.7837         | 0.7461      | 0.7644  |
+| 1.7175        | 3.0   | 4770 | 2.3348          | 0.7679   | 0.6878   | 0.5996        | 0.4134     | 0.4894 | 0.7861           | 0.8188        | 0.8021    | 0.7672         | 0.7768      | 0.7719  |
+| 1.7819        | 4.0   | 6360 | 2.4131          | 0.7643   | 0.6844   | 0.5736        | 0.4165     | 0.4826 | 0.7909           | 0.8020        | 0.7964    | 0.7571         | 0.7922      | 0.7743  |
+| 1.5224        | 5.0   | 7950 | 2.5405          | 0.7661   | 0.6864   | 0.5879        | 0.4169     | 0.4879 | 0.7843           | 0.8171        | 0.8004    | 0.7680         | 0.7737      | 0.7709  |
 ### Framework versions
+- Transformers 5.4.0
+- Pytorch 2.11.0+cu130
+- Datasets 4.8.4
 - Tokenizers 0.22.2

config.json CHANGED Viewed

@@ -39,7 +39,7 @@
   "pooler_type": "first_token_transform",
   "problem_type": "single_label_classification",
   "tie_word_embeddings": true,
-  "transformers_version": "5.3.0",
   "type_vocab_size": 2,
   "use_cache": false,
   "vocab_size": 21128

   "pooler_type": "first_token_transform",
   "problem_type": "single_label_classification",
   "tie_word_embeddings": true,
+  "transformers_version": "5.4.0",
   "type_vocab_size": 2,
   "use_cache": false,
   "vocab_size": 21128

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a5a2edb509486b6ccab75444049c7f84444b1b704a9f1b45ba8568e22276362d
 size 409103292

 version https://git-lfs.github.com/spec/v1
+oid sha256:1e9422416a6d74001deb9c6e5ef0caadc3dc07fecd7b556630339cfe76b43382
 size 409103292

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8c8729a52b86b3d5a80dcc124919836fe2d108cbf50326c0e6c512ce210cd309
 size 5265

 version https://git-lfs.github.com/spec/v1
+oid sha256:003d1a67690b110b26ce2a643db3953d4877c1dcfc591316e756d9e4d545e502
 size 5265