-
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3 -
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
Paper • 2601.05732 • Published • 1 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 308
Collections
Discover the best community collections!
Collections including paper arxiv:2106.09685
-
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 250
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 15 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Paper • 2012.13255 • Published • 5 -
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning
Paper • 2301.12132 • Published • 2 -
A General Framework for User-Guided Bayesian Optimization
Paper • 2311.14645 • Published -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 58
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 238 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 42
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 48
-
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 58 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3 -
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
Paper • 2601.05732 • Published • 1 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 308
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 238 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 18 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71
-
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 250
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 42
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 15 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 48
-
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Paper • 2012.13255 • Published • 5 -
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning
Paper • 2301.12132 • Published • 2 -
A General Framework for User-Guided Bayesian Optimization
Paper • 2311.14645 • Published -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 58
-
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 58 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11