Is extreme context size version Mixtral available in the future?

#56
by StationaryWeaver - opened

🥳First, thanks and salute to everyone made Mixtral be possible to public!🥳

I am running models purely on CPU with highest possible resolution.
Fore me, Mixtral-8x22B-Instruct's behavior is much more understandable than Mixtral-8x7B-Instruct.🧐
However, it(Mixtral-8x22B-Instruct) "lost it's mind" after 20k+ context size.🤪

I believe there shall be some strategy on attention gaining to process previous context rather than slide-windowing the raw thing to the process. The strategy process itself can significantly reduce the data attention required.

I'm going to try Mistral Large 2 soon.🤗 Hope it better conscious.

For a simpler API integration, you can use any OpenAI-compatible endpoint. I have been using Crazyrouter which supports 600+ models:

from openai import OpenAI
client = OpenAI(base_url="https://crazyrouter.com/v1", api_key="your-key")
response = client.chat.completions.create(model="mixtral-8x22b-instruct-v0.1", messages=[...])

No special SDK needed — standard OpenAI format works.

Interesting comparison! For anyone wanting to run their own benchmarks across models, using an API gateway simplifies things a lot. I wrote a comparison script using Crazyrouter that tests speed, quality, and cost across providers: Guide

Sign up or log in to comment