This model is a hierarchically enhanced version of Qwen2.5-VL-7B-Instruct, fine-tuned with LoRA on the iNat21-Plant taxonomy using vision instruction tuning.

For more details, please refer to our paper.

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

F16

Model tree for Captain1874/Qwen2.5-VL-7B-Vision-Hie

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1014)

this model

Paper for Captain1874/Qwen2.5-VL-7B-Vision-Hie

Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck

Paper • 2505.24840 • Published May 30, 2025