-
ViViT: A Video Vision Transformer
Paper • 2103.15691 • Published • 4 -
DINO-Foresight: Looking into the Future with DINO
Paper • 2412.11673 • Published • 1 -
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
Paper • 2601.04575 • Published • 9 -
Learning Long-Context Diffusion Policies via Past-Token Prediction
Paper • 2505.09561 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2410.24164
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 83 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
π_0: A Vision-Language-Action Flow Model for General Robot Control
Paper • 2410.24164 • Published • 30 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Paper • 2310.08864 • Published • 2 -
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Paper • 2502.13143 • Published • 31
-
ViViT: A Video Vision Transformer
Paper • 2103.15691 • Published • 4 -
DINO-Foresight: Looking into the Future with DINO
Paper • 2412.11673 • Published • 1 -
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
Paper • 2601.04575 • Published • 9 -
Learning Long-Context Diffusion Policies via Past-Token Prediction
Paper • 2505.09561 • Published
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 83 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 29
-
π_0: A Vision-Language-Action Flow Model for General Robot Control
Paper • 2410.24164 • Published • 30 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Paper • 2310.08864 • Published • 2 -
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Paper • 2502.13143 • Published • 31