Reka Edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
Learn more about the Reka Edge in our announcement blog post.
Key features
- Faster and more token-efficient than similarly sized VLMs
- Strong benchmark performance across VQA-v2, RefCOCO, MLVU, MMVU and Mobile Actions (see below)
- Support for vLLM (see plugin)
- Open weights license: the model can be used commercially if you make less than $1 million USD of revenue a year
Benchmarks and metrics
| Benchmark | Reka Edge | Cosmos-Reason2 8B | Qwen 3.5 9B | Gemini 3 Pro |
|---|---|---|---|---|
| VQA-V2 Visual Question Answering | 88.40 | 79.82 | 83.22 | 89.78 |
| MLVU Video Understanding | 74.30 | 37.85 | 52.39 | 80.68 |
| MMVU Multimodal Video Understanding | 71.68 | 51.52 | 68.64 | 78.88 |
| RefCOCO-A Object Detection | 93.13 | 90.98 | 93.62 | 81.46 |
| RefCOCO-B Object Detection | 86.70 | 85.74 | 88.83 | 82.85 |
| VideoHallucer Hallucination | 59.57 | 51.65 | 56.00 | 66.78 |
| Mobile Actions Tool Use | 88.40 | 77.94 | 91.78 | 89.39 |
| Metric | Reka Edge | Cosmos-Reason2 8B | Qwen 3.5 9B | Gemini 3 Pro* |
|---|---|---|---|---|
| Input tokens For a 1024 x 1024 image | 331 | 1063 | 1041 | 1094 |
| End-to-end latency (in seconds) | 4.69 ± 2.48 | 10.56 ± 3.47 | 10.31 ± 1.81 | 16.67 ± 4.47 |
| TTFT (s) Time to first token | 0.522 ± 0.452 | 0.844 ± 0.923 | 0.60 ± 0.65 | 13.929 ± 3.872 |
*Gemini 3 Pro measured via API call; other models measured with local inference.
Quick Start
🤗 Transformers (macOS)
The easiest way to run the model is with the included example.py script. It uses PEP 723 inline metadata so uv resolves dependencies automatically — no manual install step:
uv run example.py --image media/hamburger.jpg --prompt "What is in this image?"
Requirements
Edge Deployment Devices
- Mac devices with Apple Silicon
- OS: macOS 13+
- Minimum: 24 GB memory
- Recommended: 32 GB+ memory
- Linux and Windows Subsystem for Linux (WSL) PCs
- Minimum: 24 GB GPU and 24 GB+ system memory
- Recommended: 32 GB+ GPU and 32 GB+ system memory
- Nvidia Robotics & Edge AI systems
- Jetson Thor
- Jetson AGX Orin (both 32 GB and 64 GB variants)
Custom Deployment Options
With quantization, Reka Edge can also be run on:
- Jetson Orin Nano
- Samsung S25
- Qualcomm Snapdragon XR2 Gen 3 devices
- Apple iPhone, iPad, and Vision Pro
Reach out for support deploying Reka Edge to a custom edge compute platform.
Software Requirements
- Python: 3.12+
- uv (recommended) — handles dependencies automatically
Inline snippet
If you prefer not to use the script, install dependencies manually and paste the code below:
uv pip install "transformers==4.57.3" torch torchvision pillow tiktoken imageio einops av
import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
model_id = "RekaAI/reka-edge-2603"
# Load processor and model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
).eval()
# Move to MPS (Apple Silicon GPU)
device = torch.device("mps")
model = model.to(device)
# Prepare an image + text query
image_path = "media/hamburger.jpg" # included in the model repo
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "What is in this image?"},
],
}
]
# Tokenize using the chat template
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
)
# Move tensors to device
for key, val in inputs.items():
if isinstance(val, torch.Tensor):
if val.is_floating_point():
inputs[key] = val.to(device=device, dtype=torch.float16)
else:
inputs[key] = val.to(device=device)
# Generate
with torch.inference_mode():
# Stop on <sep> token (end-of-turn) in addition to default EOS
sep_token_id = processor.tokenizer.convert_tokens_to_ids("<sep>")
output_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
eos_token_id=[processor.tokenizer.eos_token_id, sep_token_id],
)
# Decode only the generated tokens
input_len = inputs["input_ids"].shape[1]
new_tokens = output_ids[0, input_len:]
output_text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
# Strip any trailing <sep> turn-boundary marker
output_text = output_text.replace("<sep>", "").strip()
print(output_text)
Video queries
The model also accepts video inputs. Use --video instead of --image:
uv run example.py --video media/dashcam.mp4 --prompt "Is this person falling asleep?"
messages = [
{
"role": "user",
"content": [
{"type": "video", "video": "media/dashcam.mp4"},
{"type": "text", "text": "Is this person falling asleep?"},
],
}
]
Object detection queries
Given an input image, we use Detect: {expression} to instruct the model to perform object detection, where {expression} can describe a single object or multiple objects.
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "Detect: red car, man in the white"},
],
}
]
Text-only queries
Omit the image entry from the content list:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is the capital of France?"},
],
}
]
Then run the same tokenization and generation steps as above.
Notes for MacOS
- MPS and dtype: Apple's MPS backend does not support
bfloat16. Always usetorch.float16. Do not usedevice_map="auto"— it is not compatible with MPS. Load the model to CPU first, then call.to("mps"). - Pinned transformers: This checkpoint was exported with
transformers==4.57.3. Using a different version may cause loading errors or incorrect behavior. - Memory: The model requires ~14 GB in float16. A Mac with 32 GB unified memory is recommended to leave headroom for the OS and generation buffers.
vLLM
For high-throughput serving, you can use the vllm-reka plugin. This plugin extends standard vLLM to support Reka's custom architectures and optimized tokenizer.
Installation
Please follow our vllm-reka installation instructions to install the plugin along with vLLM.
Serving the Model
You can start the OpenAI-compatible API server by running the script serve.sh in vllm-reka with $MODEL_PATH set to RekaAI/reka-edge-2603.
bash serve.sh
We enable BitsAndBytes quantization by default here to reduce memory usage. To disable quantization, remove the --quantization flag from server.sh.
Querying the Server
Once the server is running, you can send requests using the OpenAI API format:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
timeout=3600
)
# Video query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}},
{"type": "text", "text": "Describe the video"},
],
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Image query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "What is in this image?"}
]
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Object detection query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "Detect: green banana"}
]
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Text-only query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": "What is the capital of France?",
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
Notes
**trust_remote_code=True**is required because the model uses custom architecture code (Yasa2ForConditionalGeneration) that is bundled in this repository and loaded via theauto_mapconfig.
- Downloads last month
- 13