Instructions to use rafmacalaba/gliner2-datause-large-v1-hybrid-entities with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use rafmacalaba/gliner2-datause-large-v1-hybrid-entities with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("rafmacalaba/gliner2-datause-large-v1-hybrid-entities") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
GLiNER2 Data Mention Extractor (v1-hybrid-entities)
Fine-tuned GLiNER2 LoRA adapter for extracting structured data mentions from development economics and humanitarian research documents.
Architecture: Two-Pass Hybrid
This adapter uses a two-pass inference strategy to bypass the count_pred/count_embed
mode collapse that limits native extract_json to 1 mention per chunk:
- Pass 1 (
extract_entities): Finds ALL data mention spans using 3 entity types (named_mention,descriptive_mention,vague_mention). Bypasses count_pred entirely. - Pass 2 (
extract_json): Classifies each span individually using sentence-level context. count=1 is always correct since each call contains exactly 1 mention.
See finetuning/ARCHITECTURE.md for the full rationale.
Task
Given a document passage, extracts structured information about each dataset mentioned:
- Entity types (Pass 1 โ span detection):
named_mention: Proper names and acronyms (DHS, LSMS, FAOSTAT)descriptive_mention: Described data with identifying detail but no formal namevague_mention: Generic data references with minimal identifying detail
- Classification fields (Pass 2 โ fixed choices):
typology_tag: survey / census / database / administrative / indicator / geospatial / microdata / report / otheris_used: True / Falseusage_context: primary / supporting / background
Training
- Base model:
fastino/gliner2-large-v1 - Method: LoRA (r=16, alpha=32.0)
- Target modules: ['encoder', 'span_rep']
- Training examples: 8087
- Val examples: 563
- Best val loss: None
Usage
from gliner2 import GLiNER2
# Install the patched library first
# pip install git+https://github.com/rafmacalaba/GLiNER2.git@feat/main-mirror
extractor = GLiNER2.from_pretrained("fastino/gliner2-large-v1")
extractor.load_adapter("rafmacalaba/gliner2-datause-large-v1-hybrid-entities")
# Pass 1: Extract all mention spans
entity_schema = {
"entities": ["named_mention", "descriptive_mention", "vague_mention"],
"entity_descriptions": {
"named_mention": "A proper name or well-known acronym for a data source...",
"descriptive_mention": "A described data reference with enough detail...",
"vague_mention": "A generic or loosely specified reference to data...",
},
}
spans = extractor.extract(text, entity_schema, threshold=0.3)
# Pass 2: Classify each span
json_schema = {
"data_mention": {
"mention_name": "",
"typology_tag": {"choices": ["survey", "census", "administrative", "database",
"indicator", "geospatial", "microdata", "report", "other"]},
"is_used": {"choices": ["True", "False"]},
"usage_context": {"choices": ["primary", "supporting", "background"]},
},
}
for span in spans.get("named_mention", []):
context = extract_sentence_context(text, span)
tags = extractor.extract(context, json_schema)
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for rafmacalaba/gliner2-datause-large-v1-hybrid-entities
Base model
fastino/gliner2-large-v1