Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.05135

Papers - Benchmarks - Image and Text

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 4
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 24

Papers - Decoders

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

Paper • 2205.10350 • Published May 20, 2022 • 2
Blockwise Parallel Decoding for Deep Autoregressive Models

Paper • 1811.03115 • Published Nov 7, 2018 • 2
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 9
Sequence-Level Knowledge Distillation

Paper • 1606.07947 • Published Jun 25, 2016 • 2

Diffusion model

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

Paper • 2312.10835 • Published Dec 17, 2023 • 7
LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Paper • 2312.09256 • Published Dec 14, 2023 • 10
PromptBench: A Unified Library for Evaluation of Large Language Models

Paper • 2312.07910 • Published Dec 13, 2023 • 16
Prompt Expansion for Adaptive Text-to-Image Generation

Paper • 2312.16720 • Published Dec 27, 2023 • 5

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1, 2024 • 16
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48

All the papers I found interesting

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper • 2402.19481 • Published Feb 29, 2024 • 22
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45

Papers related to current works and research

Word Alignment by Fine-tuning Embeddings on Parallel Corpora

Paper • 2101.08231 • Published Jan 20, 2021 • 1
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

Paper • 2009.09359 • Published Sep 20, 2020 • 1
Unsupervised Multilingual Alignment using Wasserstein Barycenter

Paper • 2002.00743 • Published Jan 28, 2020
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language

Paper • 2311.10436 • Published Nov 17, 2023

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Multimodal Papers

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

Paper • 2401.15687 • Published Jan 28, 2024 • 24
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 30
MouSi: Poly-Visual-Expert Vision-Language Models

Paper • 2401.17221 • Published Jan 30, 2024 • 9

Papers - Benchmarks - Image and Text

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1, 2024 • 16
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 4
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65

All the papers I found interesting

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper • 2402.19481 • Published Feb 29, 2024 • 22
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 24

Papers related to current works and research

Word Alignment by Fine-tuning Embeddings on Parallel Corpora

Paper • 2101.08231 • Published Jan 20, 2021 • 1
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

Paper • 2009.09359 • Published Sep 20, 2020 • 1
Unsupervised Multilingual Alignment using Wasserstein Barycenter

Paper • 2002.00743 • Published Jan 28, 2020
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language

Paper • 2311.10436 • Published Nov 17, 2023

Papers - Decoders

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

Paper • 2205.10350 • Published May 20, 2022 • 2
Blockwise Parallel Decoding for Deep Autoregressive Models

Paper • 1811.03115 • Published Nov 7, 2018 • 2
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 9
Sequence-Level Knowledge Distillation

Paper • 1606.07947 • Published Jun 25, 2016 • 2

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Diffusion model

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

Paper • 2312.10835 • Published Dec 17, 2023 • 7
LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Paper • 2312.09256 • Published Dec 14, 2023 • 10
PromptBench: A Unified Library for Evaluation of Large Language Models

Paper • 2312.07910 • Published Dec 13, 2023 • 16
Prompt Expansion for Adaptive Text-to-Image Generation

Paper • 2312.16720 • Published Dec 27, 2023 • 5

Multimodal Papers

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

Paper • 2401.15687 • Published Jan 28, 2024 • 24
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 30
MouSi: Poly-Visual-Expert Vision-Language Models

Paper • 2401.17221 • Published Jan 30, 2024 • 9

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs