Llama 3.3 8B Instruct

Yes, this is official, and yes, this is, to my knowledge, a real version of Llama 3.3 8B. (I think, anyways)

Facebook has a Llama API available that allows for inference of the other Llama models (L3.3 70B, L4 Scout and Maverick), but also includes a special, new (according to the original press release) "Llama 3.3 8B" that didn't exist anywhere else and was stuck behind the Facebook API!

However. The Llama API supports finetuning L3.3... and downloading the final model in HF format. Problem solved, right?

Wellllllllllllllll. Not really. The finetuning API was hidden behind layers of support tickets. I tried when the original API dropped in April, and was just told "We'll think about it and send you any updates" (there never were any updates).

Flash forward to December, on a whim I decide to look at the API again. And... by god... the finetuning tab was there. I could click on it and start a job (please ignore that I have no idea how it works, and in fact the finetuning tab actually disappeared after the first time I clicked on it, though I could still manually go to the page).

Apparently, this was not very well tested, as there were a good few bugs, the UI was janky, and the download model function did not actually work due to CORS (I had to manually curl things to get the CDN link).

But... by god... the zip file downloaded, and I had my slightly finetuned model.

To my shock and delight, however, they also provide the adapter that they merged into the model. That means I can subtract that adapter and get the original model. And... here we are!

Benchmarks

	Llama 3.1 8B Instruct	Llama 3.3 8B Instruct (maybe)
IFEval (1 epoch, score avged across all strict/loose instruction/prompt accuracies to follow Llama 3 paper)	78.2	81.95
GPQA Diamond (3 epochs)	29.3	37.0

All benchmarks done in OpenBench at 1.0 temp.

More cursed discoveries

Apparently the context length of the original Llama 3.3 model (i.e. the regular one that the Llama API serves) is 128k, while the finetunable version is only 8k. This is true across both the downloaded version and the version of the finetune served by the API (refer to screenshot with 10k tokens worth of a as input). This does not really make any coherent sense.

Are you sure this is really Llama 3.3?

As far as I'm aware! It has stylistic tics that differ from Llama 3 and 3.1.

There are a few weird artifacts, however, of the base model being the... original Llama 3(???)

The original ZIP included an original_repo_id.json that contained:

{
    "repo_id": "meta-llama/Meta-Llama-3-8B-Instruct"
}

and, furthermore, the adapter_config.json also had Llama 3 as the base model. However, the models clearly act different and know different things! Furthermore, I tested across the Llama API and my copy, and they both share the same differences from L3 and L3.1.

Suffice it to say, I'm pretty sure this is really Llama 3.3 8B.

Is this legal?

According to the T&S of the Llama API as of December 29th, 2025:

For example, via the Llama API, you may receive access to the Llama 3.3 8b model, which is considered a Llama AI model and part of the Meta AI Materials; when downloaded, and not accessed via the Llama API, the Llama 3.3 8b model is subject to the Llama 3.3 Community License Agreement and Acceptable Use Policy.

The Llama 3.3 8b model (after downloading) is subject to the regular L3.3 license, which allows for redistribution. So... as far as I can tell, yes, this is perfectly legal to redistribute!

Please email me at fizzarolli [at] riseup.net for any conerns. If Meta would like me to take this model down, please have someone email me and ask from an official Meta address.

Downloads last month: 56

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for allura-forge/Llama-3.3-8B-Instruct

Finetunes

1 model

Merges

1 model

Quantizations

10 models