Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published May 29, 2024 • 22
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency Paper • 2309.17382 • Published Sep 29, 2023 • 5
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing Paper • 2206.02829 • Published Jun 6, 2022
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning Paper • 2207.14800 • Published Jul 29, 2022
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning Paper • 2305.04819 • Published May 8, 2023