Nazzaroth2 's Collections manga_translation
updated
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance
Loss
Paper
• 2402.05008
• Published
• 23
PALO: A Polyglot Large Multimodal Model for 5B People
Paper
• 2402.14818
• Published
• 23
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published
• 129
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
Handling Resolutions from 336 Pixels to 4K HD
Paper
• 2404.06512
• Published
• 30
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image
Generation
Paper
• 2410.08159
• Published
• 26
MangaNinja: Line Art Colorization with Precise Reference Following
Paper
• 2501.08332
• Published
• 62
Diffusion Adversarial Post-Training for One-Step Video Generation
Paper
• 2501.08316
• Published
• 36
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
• 2501.09012
• Published
• 10
Can We Generate Images with CoT? Let's Verify and Reinforce Image
Generation Step by Step
Paper
• 2501.13926
• Published
• 43
The Differences Between Direct Alignment Algorithms are a Blur
Paper
• 2502.01237
• Published
• 113
Teaching Language Models to Critique via Reinforcement Learning
Paper
• 2502.03492
• Published
• 24
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
Scaling Text-Rich Image Understanding via Code-Guided Synthetic
Multimodal Data Generation
Paper
• 2502.14846
• Published
• 14
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
Paper
• 2502.14377
• Published
• 12
KV-Edit: Training-Free Image Editing for Precise Background Preservation
Paper
• 2502.17363
• Published
• 37
Kanana: Compute-efficient Bilingual Language Models
Paper
• 2502.18934
• Published
• 65
UniTok: A Unified Tokenizer for Visual Generation and Understanding
Paper
• 2502.20321
• Published
• 30
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning
Learning
Paper
• 2502.19735
• Published
• 9
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published
• 86
On the Acquisition of Shared Grammatical Representations in Bilingual
Language Models
Paper
• 2503.03962
• Published
• 4
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published
• 57
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published
• 61
EasyControl: Adding Efficient and Flexible Control for Diffusion
Transformer
Paper
• 2503.07027
• Published
• 29
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive
Reinforcement
Paper
• 2503.06520
• Published
• 11
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by
Imitating Human Annotator Trajectories
Paper
• 2503.08625
• Published
• 27
Gemini Embedding: Generalizable Embeddings from Gemini
Paper
• 2503.07891
• Published
• 45
Edit Transfer: Learning Image Editing via Vision In-Context Relations
Paper
• 2503.13327
• Published
• 29
CLS-RL: Image Classification with Rule-Based Reinforcement Learning
Paper
• 2503.16188
• Published
• 13
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play
Visual Games with Keyboards and Mouse
Paper
• 2503.16365
• Published
• 41