oguzhanercan 's Collections Architectural Proposals
updated
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
Causal Diffusion Transformers for Generative Modeling
Paper
• 2412.12095
• Published
• 23
Tensor Product Attention Is All You Need
Paper
• 2501.06425
• Published
• 90
TransMLA: Multi-head Latent Attention Is All You Need
Paper
• 2502.07864
• Published
• 57
Transformers without Normalization
Paper
• 2503.10622
• Published
• 170
LSNet: See Large, Focus Small
Paper
• 2503.23135
• Published
• 11
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published
• 77
Latent Diffusion Autoencoders: Toward Efficient and Meaningful
Unsupervised Representation Learning in Medical Imaging
Paper
• 2504.08635
• Published
• 4
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published
• 11
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published
• 12
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
• 2504.20966
• Published
• 31
Group Downsampling with Equivariant Anti-aliasing
Paper
• 2504.17258
• Published
• 9
Paper
• 2505.14513
• Published
• 29
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer
Paper
• 2506.06952
• Published
• 9
Marrying Autoregressive Transformer and Diffusion with Multi-Reference
Autoregression
Paper
• 2506.09482
• Published
• 45
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Paper
• 2506.14761
• Published
• 17
Energy-Based Transformers are Scalable Learners and Thinkers
Paper
• 2507.02092
• Published
• 69
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper
• 2507.07955
• Published
• 27
Region-based Cluster Discrimination for Visual Representation Learning
Paper
• 2507.20025
• Published
• 19
PixNerd: Pixel Neural Field Diffusion
Paper
• 2507.23268
• Published
• 52
Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Paper
• 2508.14187
• Published
• 4
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Paper
• 2510.07318
• Published
• 31
Paper
• 2511.11238
• Published
• 38
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published
• 309
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper
• 2601.21204
• Published
• 99
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper
• 2601.16208
• Published
• 52
Nested Learning: The Illusion of Deep Learning Architectures
Paper
• 2512.24695
• Published
• 44