manga_translation - a Nazzaroth2 Collection

Nazzaroth2 's Collections

Reward Modeling

models to test out

RL_Papers in general

VLM RL Reasoning

LLM-External_information

llm_compression

LLM_Reasoning-ErrorCorrection

3D (nerfs, gaussians, generation etc.)

t2i consistency works

videogames_roleplay

small_or_multimodal_llm

manga_translation

manga_translation

updated Mar 22, 2025

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23
PALO: A Polyglot Large Multimodal Model for 5B People

Paper • 2402.14818 • Published Feb 22, 2024 • 23
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 30
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

Paper • 2410.08159 • Published Oct 10, 2024 • 26
MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published Jan 14, 2025 • 62
Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14, 2025 • 36
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published Jan 15, 2025 • 10
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23, 2025 • 43
The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3, 2025 • 113
Teaching Language Models to Critique via Reinforcement Learning

Paper • 2502.03492 • Published Feb 5, 2025 • 24
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 126
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

Paper • 2502.14846 • Published Feb 20, 2025 • 14
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

Paper • 2502.14377 • Published Feb 20, 2025 • 12
KV-Edit: Training-Free Image Editing for Precise Background Preservation

Paper • 2502.17363 • Published Feb 24, 2025 • 37
Kanana: Compute-efficient Bilingual Language Models

Paper • 2502.18934 • Published Feb 26, 2025 • 65
UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27, 2025 • 30
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning

Paper • 2502.19735 • Published Feb 27, 2025 • 9
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3, 2025 • 86
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models

Paper • 2503.03962 • Published Mar 5, 2025 • 4
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7, 2025 • 57
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10, 2025 • 61
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published Mar 10, 2025 • 29
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

Paper • 2503.06520 • Published Mar 9, 2025 • 11
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Paper • 2503.08625 • Published Mar 11, 2025 • 27
Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published Mar 10, 2025 • 45
Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17, 2025 • 29
CLS-RL: Image Classification with Rule-Based Reinforcement Learning

Paper • 2503.16188 • Published Mar 20, 2025 • 13
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20, 2025 • 41