| --- |
| --- |
| license: apache-2.0 |
| tags: |
| - medical-imaging |
| - vision-language-model |
| - vlm |
| - lora |
| - graph-neural-networks |
| - zero-shot |
| metrics: |
| - accuracy |
| --- |
| |
| # ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs |
|
|
| <div align="center"> |
| <a href="https://arxiv.org/pdf/2603.17079"> |
| <img src="https://img.shields.io/badge/arXiv-2603.17079-b31b1b.svg" alt="arXiv"> |
| </a> |
| </div> |
| |
| **ACE-LoRA** is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel **Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN)**. |
|
|
| ## Model Description |
|
|
| Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only **0.95M** trainable parameters to frozen image-text encoders. |
|
|
| ### Key Features: |
| * **ACE-HGNN Module:** Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details. |
| * **Label-Guided InfoNCE Loss:** A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment. |
| * **Efficiency:** Achieves state-of-the-art performance across multiple domains while keeping the backbone frozen. |
|
|
|
|
| ### Environment Setup |
| The framework was developed using `Python 3.10.18` and `PyTorch 2.1.0` with `CUDA 11.8`. |
|
|
| ``` |
| conda create -n ace_lora python=3.10.18 |
| conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia |
| pip install -r requirements.txt |
| |
| ``` |
| ### Inference |
| We provide an inference code sample (`hf_model_inference.py`) for the RSNA dataset. |
|
|
| ## Datasets |
|
|
| **MIMIC-CXR:** For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [[`link`](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)] |
|
|
| **NIH Chest X-ray:** For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [[`link`](https://nihcc.app.box.com/v/ChestXray-NIHCC)]. After downloading, run ```dataset_prep/chestx-ray_14_prep.py``` from our github repo to split the data and prepare it in the required format. |
|
|
| **CheXpert 5x200:** For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [[`link`](https://stanfordmedicine.app.box.com/s/j5h7q99f3pfi7enc0dom73m4nsm6yzvh)]. |
|
|
| **RSNA:** We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data)]. After downloading, run ```dataset_prep/rsna_dataset_create.py``` from our github repo to split the data and prepare it in the required format for both tasks. |
|
|
| **SIIM:** We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data)]. After downloading, run ```dataset_prep/SIIM_generate_class_labels.py``` from our github repo to prepare the data for zero-shot classification, and ```dataset_prep/SIIM_generate_mask.py``` for semantic segmentation. |
|
|
| - Code: https://github.com/icon-lab/ACE-LoRA |
| - Paper: https://arxiv.org/pdf/2603.17079 |
|
|
| ## 🤝 Acknowledgments |
| This implementation builds upon [CLIP-LoRA](https://github.com/MaxZanella/CLIP-LoRA) and [LoRA](https://github.com/microsoft/LoRA). We gratefully acknowledge their valuable contributions. |