---
tags:
- span-marker
- token-classification
- ner
- named-entity-recognition
- generated_from_span_marker_trainer
widget:
- text: On 07 Nov, send my brother the summary from section 2 of the document and
    enable airplane mode on my phone
- text: Could you please share the' Budget Reports' folder with me and update the
    notification settings in Slack before the Quarterly Review Meeting? Also, send
    the details to my email at emily . chen @ workmail . com
- text: Find all images from March 3rd that are less than 1MB, and read out the caption
    under figure 5 . Set the device to silent mode
- text: Please send the document named annual_report_2023 . xlsx from the Finance
    folder, specifically the summary on page 5, to my manager at manager @ acme .
    com
- text: Text my mother at + 44 7911 123456 the summary from paragraph 4, and then
    enable bluetooth
pipeline_tag: token-classification
library_name: span-marker
metrics:
- precision
- recall
- f1
model-index:
- name: SpanMarker
  results:
  - task:
      type: token-classification
      name: Named Entity Recognition
    dataset:
      name: Unknown
      type: unknown
      split: eval
    metrics:
    - type: f1
      value: 0.8683998712169995
      name: F1
    - type: precision
      value: 0.8558622877994606
      name: Precision
    - type: recall
      value: 0.8813102434242771
      name: Recall
---

# SpanMarker

This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.

## Model Details

### Model Description
- **Model Type:** SpanMarker
<!-- - **Encoder:** [Unknown](https://huggingface.co/unknown) -->
- **Maximum Sequence Length:** 512 tokens
- **Maximum Entity Length:** 12 words
<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
- **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)

### Model Labels
| Label          | Examples                                                                        |
|:---------------|:--------------------------------------------------------------------------------|
| action         | "Remind", "scheduled", "review"                                                 |
| app_data_type  | "items", "images", "videos"                                                     |
| app_name       | "Camera", "phone", "Slack"                                                      |
| contact_info   | "sarah . lee @ company . org", "123 Maple Street , Springfield", "home address" |
| date           | "20 . 10 . 1999", "before", "January 18 - June 15"                              |
| event_title    | "team sync", "Marketing Strategy Meeting", "Budget Planning"                    |
| file_name      | "notes", "budget_overview . xlsx", "project_plan . docx"                        |
| file_size      | "under 500 kb", "smaller than 50 kb", "exceeding 100 mb"                        |
| file_type      | "documents", "document", "image"                                                |
| folder_name    | "Projects", "Work", "Photos"                                                    |
| in_file_data   | "appendix section", "page 10", "section 5"                                      |
| limits         | "top 8", "all", "every"                                                         |
| location       | "Room 204", "server room", "library"                                            |
| person_name    | "Jonathan Kim", "Mr . Osei", "Lucas Müller"                                     |
| relationship   | "manager", "brother", "cousin"                                                  |
| setting        | "brightness", "airplane mode", "notifications"                                  |
| system_command | "disable", "move", "switch on"                                                  |
| time           | "9 : 00 AM", "10 : 45", "10 : 00 AM"                                            |

## Evaluation

### Metrics
| Label          | Precision | Recall | F1     |
|:---------------|:----------|:-------|:-------|
| **all**        | 0.8559    | 0.8813 | 0.8684 |
| action         | 0.8173    | 0.9245 | 0.8676 |
| app_data_type  | 0.7960    | 0.6828 | 0.7351 |
| app_name       | 0.9432    | 0.9432 | 0.9432 |
| contact_info   | 0.8722    | 0.9091 | 0.8903 |
| date           | 0.9160    | 0.8993 | 0.9076 |
| event_title    | 0.8659    | 0.9107 | 0.8877 |
| file_name      | 0.9371    | 0.9280 | 0.9326 |
| file_size      | 0.7810    | 0.7810 | 0.7810 |
| file_type      | 0.7731    | 0.8786 | 0.8225 |
| folder_name    | 0.9618    | 0.8968 | 0.9282 |
| in_file_data   | 0.7486    | 0.7867 | 0.7672 |
| limits         | 0.9048    | 0.6786 | 0.7755 |
| location       | 0.8917    | 0.8571 | 0.8741 |
| person_name    | 0.9885    | 0.9885 | 0.9885 |
| relationship   | 0.9505    | 0.9541 | 0.9523 |
| setting        | 0.8974    | 0.9255 | 0.9112 |
| system_command | 0.7889    | 0.7441 | 0.7659 |
| time           | 0.9076    | 0.8587 | 0.8825 |

## Uses

### Direct Use for Inference

```python
from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("Text my mother at + 44 7911 123456 the summary from paragraph 4, and then enable bluetooth")
```

### Downstream Use
You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

```python
from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")
```
</details>

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Set Metrics
| Training set          | Min | Median  | Max |
|:----------------------|:----|:--------|:----|
| Sentence length       | 3   | 19.0206 | 53  |
| Entities per sentence | 1   | 5.7015  | 13  |

### Training Hyperparameters
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
- mixed_precision_training: Native AMP

### Training Results
| Epoch  | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
|:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
| 1.8553 | 1000 | 0.0344          | 0.8301               | 0.8650            | 0.8472        | 0.9204              |
| 3.7106 | 2000 | 0.0271          | 0.8524               | 0.8804            | 0.8662        | 0.9316              |

### Framework Versions
- Python: 3.12.12
- SpanMarker: 1.7.0
- Transformers: 4.51.3
- PyTorch: 2.8.0+cu126
- Datasets: 3.6.0
- Tokenizers: 0.21.4

## Citation

### BibTeX
```
@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->