Scaling Open-Ended Reasoning to Predict the Future Paper • 2512.25070 • Published 1 day ago • 13
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10, 2025 • 16
smcleish/Recurrent-TinyLlama-3T-train-recurrence-16 Text Generation • 0.8B • Updated Nov 11, 2025 • 10 • 1
smcleish/Recurrent-TinyLlama-3T-train-recurrence-32 Text Generation • 0.8B • Updated Nov 11, 2025 • 11 • 1
smcleish/Recurrent-OLMo-2-0425-train-recurrence-4 Text Generation • 1B • Updated Nov 11, 2025 • 9 • 1
smcleish/Recurrent-OLMo-2-0425-train-recurrence-32 Text Generation • 1B • Updated Nov 11, 2025 • 8 • 2
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models Paper • 2510.14961 • Published Oct 16, 2025 • 7
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models Paper • 2510.14961 • Published Oct 16, 2025 • 7 • 2
Training Dynamics Impact Post-Training Quantization Robustness Paper • 2510.06213 • Published Oct 7, 2025 • 3
Training Dynamics Impact Post-Training Quantization Robustness Paper • 2510.06213 • Published Oct 7, 2025 • 3 • 2
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12 • 2
FAST: Factorizable Attention for Speeding up Transformers Paper • 2402.07901 • Published Feb 12, 2024 • 3
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 20