| --- |
| license: apache-2.0 |
| datasets: |
| - jonathan-roberts1/NWPU-RESISC45 |
| language: |
| - en |
| base_model: |
| - google/siglip2-base-patch16-224 |
| pipeline_tag: image-classification |
| library_name: transformers |
| tags: |
| - RESISC45 |
| - SigLIP2 |
| - Image-Classification |
| --- |
| |
| %3C!----%3E%3C%2Ftd%3E%3C%2Ftr%3E%3Ctr id="L18"> | |
|
| # **RESISC45-SigLIP2** |
|
|
| > **RESISC45-SigLIP2** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-label** image classification. It is specifically trained to recognize and tag multiple land use and land cover scene categories from the **RESISC45** dataset using the **SiglipForImageClassification** architecture. |
|
|
| > [!note] |
| *SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786 |
|
|
| ```py |
| Classification Report: |
| precision recall f1-score support |
| |
| airplane 0.9830 0.9900 0.9865 700 |
| airport 0.9461 0.9529 0.9495 700 |
| baseball diamond 0.9802 0.9886 0.9844 700 |
| basketball court 0.9516 0.9271 0.9392 700 |
| beach 0.9914 0.9900 0.9907 700 |
| bridge 0.9730 0.9771 0.9751 700 |
| chaparral 0.9957 0.9986 0.9971 700 |
| church 0.7949 0.8971 0.8430 700 |
| circular farmland 0.9914 0.9914 0.9914 700 |
| cloud 0.9957 0.9871 0.9914 700 |
| commercial area 0.9231 0.8229 0.8701 700 |
| dense residential 0.9355 0.8914 0.9129 700 |
| desert 0.9821 0.9414 0.9613 700 |
| forest 0.9652 0.9514 0.9583 700 |
| freeway 0.9344 0.9571 0.9457 700 |
| golf course 0.9759 0.9843 0.9801 700 |
| ground track field 0.9623 0.9857 0.9739 700 |
| harbor 0.9885 0.9843 0.9864 700 |
| industrial area 0.9505 0.9043 0.9268 700 |
| intersection 0.9855 0.9686 0.9769 700 |
| island 0.9871 0.9829 0.9850 700 |
| lake 0.9440 0.9629 0.9533 700 |
| meadow 0.9564 0.9400 0.9481 700 |
| medium residential 0.8602 0.9314 0.8944 700 |
| mobile home park 0.9610 0.9500 0.9555 700 |
| mountain 0.9388 0.9429 0.9408 700 |
| overpass 0.9614 0.9614 0.9614 700 |
| palace 0.8455 0.8286 0.8369 700 |
| parking lot 0.9899 0.9757 0.9827 700 |
| railway 0.9407 0.9071 0.9236 700 |
| railway station 0.9104 0.9143 0.9123 700 |
| rectangular farmland 0.9572 0.9271 0.9419 700 |
| river 0.9281 0.9586 0.9431 700 |
| roundabout 0.9914 0.9871 0.9893 700 |
| runway 0.9669 0.9586 0.9627 700 |
| sea ice 0.9957 0.9943 0.9950 700 |
| ship 0.9558 0.9886 0.9719 700 |
| snowberg 0.9886 0.9900 0.9893 700 |
| sparse residential 0.9238 0.9700 0.9463 700 |
| stadium 0.9716 0.9757 0.9736 700 |
| storage tank 0.9787 0.9829 0.9808 700 |
| tennis court 0.9326 0.9486 0.9405 700 |
| terrace 0.9372 0.9586 0.9477 700 |
| thermal power station 0.9482 0.9671 0.9576 700 |
| wetland 0.9444 0.8986 0.9209 700 |
| |
| accuracy 0.9532 31500 |
| macro avg 0.9538 0.9532 0.9532 31500 |
| weighted avg 0.9538 0.9532 0.9532 31500 |
| ``` |
|
|
| --- |
|
|
| ## **Label Space: 45 Scene Categories** |
|
|
| The model predicts the presence of one or more of the following **45 scene categories**: |
|
|
| ``` |
| Class 0: "airplane" |
| Class 1: "airport" |
| Class 2: "baseball diamond" |
| Class 3: "basketball court" |
| Class 4: "beach" |
| Class 5: "bridge" |
| Class 6: "chaparral" |
| Class 7: "church" |
| Class 8: "circular farmland" |
| Class 9: "cloud" |
| Class 10: "commercial area" |
| Class 11: "dense residential" |
| Class 12: "desert" |
| Class 13: "forest" |
| Class 14: "freeway" |
| Class 15: "golf course" |
| Class 16: "ground track field" |
| Class 17: "harbor" |
| Class 18: "industrial area" |
| Class 19: "intersection" |
| Class 20: "island" |
| Class 21: "lake" |
| Class 22: "meadow" |
| Class 23: "medium residential" |
| Class 24: "mobile home park" |
| Class 25: "mountain" |
| Class 26: "overpass" |
| Class 27: "palace" |
| Class 28: "parking lot" |
| Class 29: "railway" |
| Class 30: "railway station" |
| Class 31: "rectangular farmland" |
| Class 32: "river" |
| Class 33: "roundabout" |
| Class 34: "runway" |
| Class 35: "sea ice" |
| Class 36: "ship" |
| Class 37: "snowberg" |
| Class 38: "sparse residential" |
| Class 39: "stadium" |
| Class 40: "storage tank" |
| Class 41: "tennis court" |
| Class 42: "terrace" |
| Class 43: "thermal power station" |
| Class 44: "wetland" |
| ``` |
|
|
| --- |
|
|
| ## **Install dependencies** |
|
|
| ```bash |
| pip install -q transformers torch pillow gradio |
| ``` |
|
|
| --- |
|
|
| ## **Inference Code** |
|
|
| ```python |
| import gradio as gr |
| from transformers import AutoImageProcessor, SiglipForImageClassification |
| from PIL import Image |
| import torch |
| |
| # Load model and processor |
| model_name = "prithivMLmods/RESISC45-SigLIP2" # Update to your actual Hugging Face model path |
| model = SiglipForImageClassification.from_pretrained(model_name) |
| processor = AutoImageProcessor.from_pretrained(model_name) |
| |
| # Label map |
| id2label = { |
| "0": "airplane", "1": "airport", "2": "baseball diamond", "3": "basketball court", "4": "beach", |
| "5": "bridge", "6": "chaparral", "7": "church", "8": "circular farmland", "9": "cloud", |
| "10": "commercial area", "11": "dense residential", "12": "desert", "13": "forest", "14": "freeway", |
| "15": "golf course", "16": "ground track field", "17": "harbor", "18": "industrial area", "19": "intersection", |
| "20": "island", "21": "lake", "22": "meadow", "23": "medium residential", "24": "mobile home park", |
| "25": "mountain", "26": "overpass", "27": "palace", "28": "parking lot", "29": "railway", |
| "30": "railway station", "31": "rectangular farmland", "32": "river", "33": "roundabout", "34": "runway", |
| "35": "sea ice", "36": "ship", "37": "snowberg", "38": "sparse residential", "39": "stadium", |
| "40": "storage tank", "41": "tennis court", "42": "terrace", "43": "thermal power station", "44": "wetland" |
| } |
| |
| def classify_resisc_image(image): |
| image = Image.fromarray(image).convert("RGB") |
| inputs = processor(images=image, return_tensors="pt") |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| logits = outputs.logits |
| probs = torch.sigmoid(logits).squeeze().tolist() |
| |
| threshold = 0.5 |
| predictions = { |
| id2label[str(i)]: round(probs[i], 3) |
| for i in range(len(probs)) if probs[i] >= threshold |
| } |
| |
| return predictions or {"None Detected": 0.0} |
| |
| # Gradio Interface |
| iface = gr.Interface( |
| fn=classify_resisc_image, |
| inputs=gr.Image(type="numpy"), |
| outputs=gr.Label(label="Predicted Scene Categories"), |
| title="RESISC45-SigLIP2", |
| description="Upload a satellite image to detect multiple land use and land cover categories (e.g., airport, forest, mountain)." |
| ) |
| |
| if __name__ == "__main__": |
| iface.launch() |
| ``` |
|
|
| --- |
|
|
| ## **Intended Use** |
|
|
| The **RESISC45-SigLIP2** model is ideal for multi-label classification tasks involving remote sensing imagery. Use cases include: |
|
|
| - **Remote Sensing Analysis** – Label elements in aerial/satellite images. |
| - **Urban Planning** – Identify urban structures and landscape features. |
| - **Geospatial Intelligence** – Aid in automated image interpretation pipelines. |
| - **Environmental Monitoring** – Track natural landforms and changes. |