apple
/

AIM-600M

Image Classification

Model card Files Files and versions

AIM-600M / README.md

pcuenq's picture

pcuenq HF Staff

AMLR license (#1)

9831bd4 verified about 1 year ago

|

history blame contribute delete

2.32 kB

	---
	license: apple-amlr
	license_name: apple-sample-code-license
	license_link: LICENSE
	library_name: ml-aim
	pipeline_tag: image-classification
	---

	# AIM: Autoregressive Image Models

	*Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar,
	Joshua M Susskind, and Armand Joulin*


	This software project accompanies the research paper, [Scalable Pre-training of Large Autoregressive Image Models](https://arxiv.org/abs/2401.08541).

	We introduce AIM a collection of vision models pre-trained with an autoregressive generative objective.
	We show that autoregressive pre-training of image features exhibits similar scaling properties to their
	textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:
	1. the model capacity can be trivially scaled to billions of parameters, and
	2. AIM effectively leverages large collections of uncurated image data.

	## Installation
	Please install PyTorch using the official [installation instructions](https://pytorch.org/get-started/locally/).
	Afterward, install the package as:
	```commandline
	pip install git+https://git@github.com/apple/ml-aim.git
	```


	## Usage
	Below we provide an example of loading the model via [HuggingFace Hub](https://huggingface.co/docs/hub/) as:
	```python
	from PIL import Image

	from aim.torch.models import AIMForImageClassification
	from aim.torch.data import val_transforms

	img = Image.open(...)
	model = AIMForImageClassification.from_pretrained("apple/aim-600M")
	transform = val_transforms()

	inp = transform(img).unsqueeze(0)
	logits, features = model(inp)
	```

	### ImageNet-1k results (frozen trunk)

	The table below contains the classification results on ImageNet-1k validation set.

	<table style="margin: auto">
	<thead>
	<tr>
	<th rowspan="2">model</th>
	<th colspan="2">top-1 IN-1k</th>
	</tr>
	<tr>
	<th>last layer</th>
	<th>best layer</th>
	</tr>
	</thead>

	<tbody>
	<tr>
	<td>AIM-0.6B</td>
	<td>78.5%</td>
	<td>79.4%</td>
	</tr>
	<tr>
	<td>AIM-1B</td>
	<td>80.6%</td>
	<td>82.3%</td>
	</tr>
	<tr>
	<td>AIM-3B</td>
	<td>82.2%</td>
	<td>83.3%</td>
	</tr>
	<tr>
	<td>AIM-7B</td>
	<td>82.4%</td>
	<td>84.0%</td>
	</tr>
	</tbody>
	</table>