Judging with Confidence: Calibrating Autoraters to Preference Distributions Paper • 2510.00263 • Published Sep 30, 2025 • 14
Does More Inference-Time Compute Really Help Robustness? Paper • 2507.15974 • Published Jul 21, 2025 • 7
Effectively Controlling Reasoning Models through Thinking Intervention Paper • 2503.24370 • Published Mar 31, 2025 • 19
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy Paper • 2410.09102 • Published Oct 9, 2024 • 2
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Paper • 2410.05248 • Published Oct 7, 2024 • 9