ACE-LoRA / README.md

Update README.md

5768098 about 1 month ago

3.81 kB

	---
	---
	license: apache-2.0
	tags:
	- medical-imaging
	- vision-language-model
	- vlm
	- lora
	- graph-neural-networks
	- zero-shot
	metrics:
	- accuracy
	---

	# ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs

	<div align="center">
	<a href="https://arxiv.org/pdf/2603.17079">
	<img src="https://img.shields.io/badge/arXiv-2603.17079-b31b1b.svg" alt="arXiv">
	</a>
	</div>

	ACE-LoRA is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN).

	## Model Description

	Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only 0.95M trainable parameters to frozen image-text encoders.

	### Key Features:
	* ACE-HGNN Module: Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details.
	* Label-Guided InfoNCE Loss: A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment.
	* Efficiency: Achieves state-of-the-art performance across multiple domains while keeping the backbone frozen.


	### Environment Setup
	The framework was developed using `Python 3.10.18` and `PyTorch 2.1.0` with `CUDA 11.8`.

	```
	conda create -n ace_lora python=3.10.18
	conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
	pip install -r requirements.txt

	```
	### Inference
	We provide an inference code sample (`hf_model_inference.py`) for the RSNA dataset.

	## Datasets

	MIMIC-CXR: For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [[`link`](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)]

	NIH Chest X-ray: For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [[`link`](https://nihcc.app.box.com/v/ChestXray-NIHCC)]. After downloading, run ```dataset_prep/chestx-ray_14_prep.py``` from our github repo to split the data and prepare it in the required format.

	CheXpert 5x200: For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [[`link`](https://stanfordmedicine.app.box.com/s/j5h7q99f3pfi7enc0dom73m4nsm6yzvh)].

	RSNA: We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data)]. After downloading, run ```dataset_prep/rsna_dataset_create.py``` from our github repo to split the data and prepare it in the required format for both tasks.

	SIIM: We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data)]. After downloading, run ```dataset_prep/SIIM_generate_class_labels.py``` from our github repo to prepare the data for zero-shot classification, and ```dataset_prep/SIIM_generate_mask.py``` for semantic segmentation.

	- Code: https://github.com/icon-lab/ACE-LoRA
	- Paper: https://arxiv.org/pdf/2603.17079

	## 🤝 Acknowledgments
	This implementation builds upon [CLIP-LoRA](https://github.com/MaxZanella/CLIP-LoRA) and [LoRA](https://github.com/microsoft/LoRA). We gratefully acknowledge their valuable contributions.