Accurate & efficient vision models, ops and systems
AI & ML interests
Computer Vision, AI, Machine Learning
Recent Activity
View all activity
Papers
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation
Generative AI for visual creativity
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
-
VisPer-LM
π5Visualize image depth, segmentation, and generation
-
shi-labs/OLA-VLM-CLIP-ViT-Llama3-8b
Image-Text-to-Text β’ 8B β’ Updated β’ 6 -
shi-labs/OLA-VLM-CLIP-ConvNeXT-Phi3-4k-mini
Image-Text-to-Text β’ 5B β’ Updated β’ 6 β’ 1 -
shi-labs/OLA-VLM-CLIP-ConvNeXT-Llama3-8b
Image-Text-to-Text β’ 9B β’ Updated β’ 4 β’ 1
Large multimodal models
-
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame64-s1t4
Video-Text-to-Text β’ 9B β’ Updated β’ 47 -
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame96-s1t6
Video-Text-to-Text β’ 9B β’ Updated β’ 6 -
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame128-s2t4
9B β’ Updated β’ 2
Accurate & efficient vision models, ops and systems
Large multimodal models
Generative AI for visual creativity
-
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame64-s1t4
Video-Text-to-Text β’ 9B β’ Updated β’ 47 -
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame96-s1t6
Video-Text-to-Text β’ 9B β’ Updated β’ 6 -
shi-labs/slowfast-video-mllm-qwen2-7b-convnext-576-frame128-s2t4
9B β’ Updated β’ 2
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
-
VisPer-LM
π5Visualize image depth, segmentation, and generation
-
shi-labs/OLA-VLM-CLIP-ViT-Llama3-8b
Image-Text-to-Text β’ 8B β’ Updated β’ 6 -
shi-labs/OLA-VLM-CLIP-ConvNeXT-Phi3-4k-mini
Image-Text-to-Text β’ 5B β’ Updated β’ 6 β’ 1 -
shi-labs/OLA-VLM-CLIP-ConvNeXT-Llama3-8b
Image-Text-to-Text β’ 9B β’ Updated β’ 4 β’ 1