r's picture

r PRO

oceansweep

·

AI & ML interests

None yet

Recent Activity

liked a model about 6 hours ago

stepfun-ai/Step-3.5-Flash

liked a model 4 days ago

Qwen/Qwen3-ASR-0.6B

liked a model 4 days ago

Qwen/Qwen3-ForcedAligner-0.6B

View all activity

Organizations

None yet

upvoted a paper 5 days ago

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking

Paper • 2601.17645 • Published 8 days ago • 22

upvoted 4 papers 10 days ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published 16 days ago • 32

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published 11 days ago • 56

Learning to Discover at Test Time

Paper • 2601.16175 • Published 11 days ago • 41

LLM-in-Sandbox Elicits General Agentic Intelligence

Paper • 2601.16206 • Published 11 days ago • 82

upvoted a collection 10 days ago

Qwen3-TTS

7 items • Updated 11 days ago • 270

upvoted a paper 12 days ago

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Paper • 2601.11077 • Published 17 days ago • 64

upvoted a paper 16 days ago

Deriving Character Logic from Storyline as Codified Decision Trees

Paper • 2601.10080 • Published 18 days ago • 6

upvoted a paper 19 days ago

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Paper • 2601.07226 • Published 21 days ago • 32

upvoted a paper 24 days ago

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Paper • 2601.03986 • Published 26 days ago • 34

upvoted 2 papers 25 days ago

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Paper • 2601.01592 • Published 29 days ago • 12

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Paper • 2601.03699 • Published 26 days ago • 6

upvoted 4 papers about 1 month ago

End-to-End Test-Time Training for Long Context

Paper • 2512.23675 • Published Dec 29, 2025 • 20

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Paper • 2512.21185 • Published Dec 24, 2025 • 30

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 292

Are We on the Right Way to Assessing LLM-as-a-Judge?

Paper • 2512.16041 • Published Dec 17, 2025 • 34

upvoted 4 papers about 2 months ago

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Paper • 2512.12967 • Published Dec 15, 2025 • 108

Memory in the Age of AI Agents

Paper • 2512.13564 • Published Dec 15, 2025 • 147

In-Context Representation Hijacking

Paper • 2512.03771 • Published Dec 3, 2025 • 4

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 152