Papers - a Diwa08 Collection

Diwa08 's Collections

Papers

updated 3 days ago

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 44
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 94
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 67
LongVideoAgent: Multi-Agent Reasoning with Long Videos

Paper • 2512.20618 • Published Dec 23, 2025 • 56
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published Dec 22, 2025 • 66
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Paper • 2601.03559 • Published Jan 7 • 14
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31
Self-Refining Video Sampling

Paper • 2601.18577 • Published Jan 26 • 25
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 46
LatentMem: Customizing Latent Memory for Multi-Agent Systems

Paper • 2602.03036 • Published Feb 3 • 14
Reinforcement World Model Learning for LLM-based Agents

Paper • 2602.05842 • Published Feb 5 • 27
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 49
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Paper • 2603.03205 • Published Mar 3 • 13
Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Paper • 2603.04791 • Published Mar 5 • 18
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Paper • 2602.23166 • Published Feb 26 • 45
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Paper • 2603.04257 • Published Mar 4 • 19
Dynamic Chunking Diffusion Transformer

Paper • 2603.06351 • Published Mar 6 • 15
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published about 1 month ago • 47
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Paper • 2603.13594 • Published 27 days ago • 147
LoST: Level of Semantics Tokenization for 3D Shapes

Paper • 2603.17995 • Published 22 days ago • 31
GPA: Learning GUI Process Automation from Demonstrations

Paper • 2604.01676 • Published 8 days ago • 16
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

Paper • 2604.02190 • Published 8 days ago • 25
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Paper • 2603.29620 • Published 9 days ago • 46