InfoSynth: Information-Guided Benchmark Synthesis for LLMs Paper • 2601.00575 • Published 4 days ago • 1
InfoSynth: Information-Guided Benchmark Synthesis for LLMs Paper • 2601.00575 • Published 4 days ago • 1
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback Paper • 2506.11930 • Published Jun 13, 2025 • 53
Progent: Programmable Privilege Control for LLM Agents Paper • 2504.11703 • Published Apr 16, 2025 • 6
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs Paper • 2504.04715 • Published Apr 7, 2025 • 13
Benchmarking Language Model Creativity: A Case Study on Code Generation Paper • 2407.09007 • Published Jul 12, 2024 • 4
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published Oct 1, 2024 • 35
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation Paper • 2203.07687 • Published Mar 15, 2022
Protecting Language Generation Models via Invisible Watermarking Paper • 2302.03162 • Published Feb 6, 2023
Weak-to-Strong Jailbreaking on Large Language Models Paper • 2401.17256 • Published Jan 30, 2024 • 16