Is extreme context size version Mixtral available in the future?
🥳First, thanks and salute to everyone made Mixtral be possible to public!🥳
I am running models purely on CPU with highest possible resolution.
Fore me, Mixtral-8x22B-Instruct's behavior is much more understandable than Mixtral-8x7B-Instruct.🧐
However, it(Mixtral-8x22B-Instruct) "lost it's mind" after 20k+ context size.🤪
I believe there shall be some strategy on attention gaining to process previous context rather than slide-windowing the raw thing to the process. The strategy process itself can significantly reduce the data attention required.
I'm going to try Mistral Large 2 soon.🤗 Hope it better conscious.
For a simpler API integration, you can use any OpenAI-compatible endpoint. I have been using Crazyrouter which supports 600+ models:
from openai import OpenAI
client = OpenAI(base_url="https://crazyrouter.com/v1", api_key="your-key")
response = client.chat.completions.create(model="mixtral-8x22b-instruct-v0.1", messages=[...])
No special SDK needed — standard OpenAI format works.
Interesting comparison! For anyone wanting to run their own benchmarks across models, using an API gateway simplifies things a lot. I wrote a comparison script using Crazyrouter that tests speed, quality, and cost across providers: Guide