neuraloverflow 's Collections To read
updated
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
• 2310.11511
• Published • 79
In-Context Learning Creates Task Vectors
Paper
• 2310.15916
• Published • 43
Matryoshka Diffusion Models
Paper
• 2310.15111
• Published • 45
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
• 2310.13639
• Published • 25
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper
• 2310.12773
• Published • 28
An Image is Worth Multiple Words: Learning Object Level Concepts using
Multi-Concept Prompt Learning
Paper
• 2310.12274
• Published • 13
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper
• 2310.11441
• Published • 29
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
• 2310.10638
• Published • 30
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
• 2310.03502
• Published • 79
How FaR Are Large Language Models From Agents with Theory-of-Mind?
Paper
• 2310.03051
• Published • 35
Large Language Models Cannot Self-Correct Reasoning Yet
Paper
• 2310.01798
• Published • 36
Enable Language Models to Implicitly Learn Self-Improvement From Data
Paper
• 2310.00898
• Published • 24
PixArt-α: Fast Training of Diffusion Transformer for
Photorealistic Text-to-Image Synthesis
Paper
• 2310.00426
• Published • 61
Conditional Diffusion Distillation
Paper
• 2310.01407
• Published • 19
Vision Transformers Need Registers
Paper
• 2309.16588
• Published • 86
Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference
Paper
• 2310.04378
• Published • 22
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
• 2310.17680
• Published • 74
Personas as a Way to Model Truthfulness in Language Models
Paper
• 2310.18168
• Published • 5
A Picture is Worth a Thousand Words: Principled Recaptioning Improves
Image Generation
Paper
• 2310.16656
• Published • 54
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
• 2311.00430
• Published • 56
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
• 2311.00571
• Published • 42
Controllable Music Production with Diffusion Models and Guidance
Gradients
Paper
• 2311.00613
• Published • 25
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper
• 2311.00618
• Published • 22
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper
• 2311.00059
• Published • 19
Grounding Visual Illusions in Language: Do Vision-Language Models
Perceive Illusions Like Humans?
Paper
• 2311.00047
• Published • 10
CapsFusion: Rethinking Image-Text Data at Scale
Paper
• 2310.20550
• Published • 27
Beyond U: Making Diffusion Models Faster & Lighter
Paper
• 2310.20092
• Published • 12
LoRAShear: Efficient Large Language Model Structured Pruning and
Knowledge Recovery
Paper
• 2310.18356
• Published • 24
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
• 2310.20587
• Published • 18
TinyStories: How Small Can Language Models Be and Still Speak Coherent
English?
Paper
• 2305.07759
• Published • 45
Textbooks Are All You Need
Paper
• 2306.11644
• Published • 154
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
• 2310.16795
• Published • 27
FLAP: Fast Language-Audio Pre-training
Paper
• 2311.01615
• Published • 16
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
• 2311.05556
• Published • 86
Levels of AGI for Operationalizing Progress on the Path to AGI
Paper
• 2311.02462
• Published • 36
The Impact of Large Language Models on Scientific Discovery: a
Preliminary Study using GPT-4
Paper
• 2311.07361
• Published • 14
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads
to Answers Faster
Paper
• 2311.08263
• Published • 16
Technical Report: Large Language Models can Strategically Deceive their
Users when Put Under Pressure
Paper
• 2311.07590
• Published • 17
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper
• 2311.07069
• Published • 44
Prompt Engineering a Prompt Engineer
Paper
• 2311.05661
• Published • 22
PolyMaX: General Dense Prediction with Mask Transformer
Paper
• 2311.05770
• Published • 8
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
• 2311.09257
• Published • 47
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published • 58
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
Modality Collaboration
Paper
• 2311.04257
• Published • 22
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
an Alternative to Attention Layers in Transformers
Paper
• 2311.10642
• Published • 25
Orca 2: Teaching Small Language Models How to Reason
Paper
• 2311.11045
• Published • 77
Exponentially Faster Language Modelling
Paper
• 2311.10770
• Published • 119
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper
• 2311.11501
• Published • 37
System 2 Attention (is something you might need too)
Paper
• 2311.11829
• Published • 43
GAIA: a benchmark for General AI Assistants
Paper
• 2311.12983
• Published • 247
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
• 2311.13231
• Published • 28
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
• 2312.03818
• Published • 34
Magicoder: Source Code Is All You Need
Paper
• 2312.02120
• Published • 82
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published • 32
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
• 2312.03491
• Published • 34
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper
• 2312.04474
• Published • 34
DeepCache: Accelerating Diffusion Models for Free
Paper
• 2312.00858
• Published • 23
Your ViT is Secretly an Image Segmentation Model
Paper
• 2503.19108
• Published • 25
Dita: Scaling Diffusion Transformer for Generalist
Vision-Language-Action Policy
Paper
• 2503.19757
• Published • 51
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published • 124
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published • 122
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies
Paper
• 2602.09877
• Published • 197
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
Paper
• 2602.10090
• Published • 52
Code2World: A GUI World Model via Renderable Code Generation
Paper
• 2602.09856
• Published • 202
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
Paper
• 2602.08222
• Published • 287
PaperBanana: Automating Academic Illustration for AI Scientists
Paper
• 2601.23265
• Published • 222
SWE-Universe: Scale Real-World Verifiable Environments to Millions
Paper
• 2602.02361
• Published • 60
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper
• 2602.02493
• Published • 46
Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs
Paper
• 2602.02338
• Published • 42
Context Forcing: Consistent Autoregressive Video Generation with Long Context
Paper
• 2602.06028
• Published • 36
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper
• 2602.02488
• Published • 36
Reinforcement World Model Learning for LLM-based Agents
Paper
• 2602.05842
• Published • 27
Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning
Paper
• 2602.01335
• Published • 16
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Paper
• 2602.03786
• Published • 90
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Paper
• 2602.02437
• Published • 80
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Paper
• 2602.08234
• Published • 72
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
Paper
• 2602.08990
• Published • 78
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
Paper
• 2602.05027
• Published • 63
Improving Data and Reward Design for Scientific Reasoning in Large Language Models
Paper
• 2602.08321
• Published • 43
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration
Paper
• 2602.01734
• Published • 32
Self-Improving World Modelling with Latent Actions
Paper
• 2602.06130
• Published • 32
When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents
Paper
• 2512.11277
• Published
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math
Paper
• 2602.06291
• Published • 23
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
Paper
• 2602.10560
• Published • 31
iGRPO: Self-Feedback-Driven LLM Reasoning
Paper
• 2602.09000
• Published • 18
UI-Venus-1.5 Technical Report
Paper
• 2602.09082
• Published • 157
Chain of Mindset: Reasoning with Adaptive Cognitive Modes
Paper
• 2602.10063
• Published • 75
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs
Paper
• 2602.10388
• Published • 244
AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis
Paper
• 2602.09372
• Published • 7
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent
Paper
• 2602.03955
• Published • 8
InftyThink: Breaking the Length Limits of Long-Context Reasoning in
Large Language Models
Paper
• 2503.06692
• Published • 2
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
Paper
• 2602.06960
• Published • 14
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
Paper
• 2602.11858
• Published • 62
Intelligent AI Delegation
Paper
• 2602.11865
• Published • 16
Architecting Agentic Communities using Design Patterns
Paper
• 2601.03624
• Published
Internet of Agentic AI: Incentive-Compatible Distributed Teaming and Workflow
Paper
• 2602.03145
• Published
BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents
Paper
• 2602.12876
• Published • 11
WebWorld: A Large-Scale World Model for Web Agent Training
Paper
• 2602.14721
• Published • 11
HeartMuLa: A Family of Open Sourced Music Foundation Models
Paper
• 2601.10547
• Published • 48
Recursive Language Models
Paper
• 2512.24601
• Published • 94
SimpleMem: Efficient Lifelong Memory for LLM Agents
Paper
• 2601.02553
• Published • 37
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual
and Long-Form Speech Recognition Evaluation
Paper
• 2510.06961
• Published • 11
Qwen3-ASR Technical Report
Paper
• 2601.21337
• Published • 36
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices
Paper
• 2509.02523
• Published • 21
Index-ASR Technical Report
Paper
• 2601.00890
• Published
Fast KV Compaction via Attention Matching
Paper
• 2602.16284
• Published • 1
ArXiv-to-Model: A Practical Study of Scientific LM Training
Paper
• 2602.17288
• Published • 9
AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines
Paper
• 2602.14296
• Published • 51
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
Paper
• 2602.16855
• Published • 50
SkillOrchestra: Learning to Route Agents via Skill Transfer
Paper
• 2602.19672
• Published • 57
A Very Big Video Reasoning Suite
Paper
• 2602.20159
• Published • 516
Multi-Vector Index Compression in Any Modality
Paper
• 2602.21202
• Published • 22
CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era
Paper
• 2602.23452
• Published • 17
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Paper
• 2602.23008
• Published • 37
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Paper
• 2602.22766
• Published • 43
MediX-R1: Open Ended Medical Reinforcement Learning
Paper
• 2602.23363
• Published • 22
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
Paper
• 2602.22675
• Published • 23
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
Paper
• 2602.23258
• Published • 28
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
Paper
• 2601.18137
• Published • 35
Agentic Reasoning for Large Language Models
Paper
• 2601.12538
• Published • 202
Heterogeneous Agent Collaborative Reinforcement Learning
Paper
• 2603.02604
• Published • 191
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper
• 2602.08354
• Published • 262
DREAM: Deep Research Evaluation with Agentic Metrics
Paper
• 2602.18940
• Published • 14
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published • 71
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Paper
• 2602.12705
• Published • 66
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
Paper
• 2602.07274
• Published • 208
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
Paper
• 2602.02185
• Published • 117
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
Paper
• 2601.22975
• Published • 110
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Paper
• 2602.02474
• Published • 62
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents
Paper
• 2602.01566
• Published • 52
Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles
Paper
• 2602.01590
• Published • 33
Self-Hinting Language Models Enhance Reinforcement Learning
Paper
• 2602.03143
• Published • 31
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published • 86
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
Paper
• 2603.03379
• Published • 32
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
Paper
• 2603.03756
• Published • 89
SkillNet: Create, Evaluate, and Connect AI Skills
Paper
• 2603.04448
• Published • 91
Progressive Residual Warmup for Language Model Pretraining
Paper
• 2603.05369
• Published • 36
Memory in the Age of AI Agents
Paper
• 2512.13564
• Published • 157
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Paper
• 2507.17746
• Published • 5
Agentic Rubrics as Contextual Verifiers for SWE Agents
Paper
• 2601.04171
• Published • 13
DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report
Paper
• 2601.08536
• Published • 3
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper
• 2603.01562
• Published • 64
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
Paper
• 2511.07685
• Published • 10
Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics
Paper
• 2602.10885
• Published • 1
How Far Can Unsupervised RLVR Scale LLM Training?
Paper
• 2603.08660
• Published • 57
Reasoning Models Struggle to Control their Chains of Thought
Paper
• 2603.05706
• Published • 36
General Agentic Memory Via Deep Research
Paper
• 2511.18423
• Published • 170
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment
Paper
• 2602.23068
• Published • 7
Fish Audio S2 Technical Report
Paper
• 2603.08823
• Published • 37
One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers
Paper
• 2603.12245
• Published • 18
Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications
Paper
• 2603.08806
• Published • 7
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning
Paper
• 2603.05863
• Published • 5
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
Paper
• 2603.09906
• Published • 75
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
Paper
• 2603.12201
• Published • 52
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Paper
• 2603.12180
• Published • 64
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper
• 2603.12228
• Published • 12
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement
Learning
Paper
• 2509.24372
• Published • 12
Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly
Complex and Diverse Learning Environments and Their Solutions
Paper
• 1901.01753
• Published • 2
Learning to Continually Learn via Meta-learning Agentic Memory Designs
Paper
• 2602.07755
• Published • 7
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Paper
• 2505.22954
• Published • 15
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper
• 2603.04597
• Published • 210
OpenClaw-RL: Train Any Agent Simply by Talking
Paper
• 2603.10165
• Published • 148
In-Context Reinforcement Learning for Tool Use in Large Language Models
Paper
• 2603.08068
• Published • 42
CREATE: Testing LLMs for Associative Creativity
Paper
• 2603.09970
• Published • 14
LMEB: Long-horizon Memory Embedding Benchmark
Paper
• 2603.12572
• Published • 73
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
Paper
• 2603.12793
• Published • 38
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published • 148
FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory
Paper
• 2601.18642
• Published • 1
AI Can Learn Scientific Taste
Paper
• 2603.14473
• Published • 415
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper
• 2603.15594
• Published • 148
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
Paper
• 2603.13594
• Published • 145
Learning to Discover at Test Time
Paper
• 2601.16175
• Published • 44
Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science
Paper
• 2603.15381
• Published • 1
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent
Paper
• 2603.13875
• Published • 34
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper
• 2603.17187
• Published • 136
Paper
• 2603.19461
• Published • 46
REVERE: Reflective Evolving Research Engineer for Scientific Workflows
Paper
• 2603.20667
• Published • 17
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis
Paper
• 2603.20278
• Published • 91
Deep Tabular Research via Continual Experience-Driven Execution
Paper
• 2603.09151
• Published • 14
CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare
Paper
• 2603.24157
• Published • 10
Memento-Skills: Let Agents Design Agents
Paper
• 2603.18743
• Published • 56
Paper
• 2603.25551
• Published • 56
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
Paper
• 2603.24533
• Published • 46
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
Paper
• 2603.22341
• Published • 36
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
Paper
• 2604.02029
• Published • 114
Terminal Agents Suffice for Enterprise Automation
Paper
• 2604.00073
• Published • 82
Towards a Medical AI Scientist
Paper
• 2603.28589
• Published • 85
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
Paper
• 2603.28342
• Published • 24
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
Paper
• 2603.24440
• Published • 94
Composer 2 Technical Report
Paper
• 2603.24477
• Published • 15
MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies
Paper
• 2603.24649
• Published • 31
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models
Paper
• 2603.25750
• Published • 35
INSID3: Training-Free In-Context Segmentation with DINOv3
Paper
• 2603.28480
• Published • 5
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning
Paper
• 2603.21289
• Published • 34
Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?
Paper
• 2603.22582
• Published • 7
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
Paper
• 2603.25040
• Published • 125
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
Paper
• 2603.28376
• Published • 20
Story2Proposal: A Scaffold for Structured Scientific Paper Writing
Paper
• 2603.27065
• Published • 21