Revisiting the Shape Convention of Transformer Language Models
Paper
•
2602.06471
•
Published
•
4
None defined yet.
Revisiting the Shape Convention of Transformer Language Models
Rethinking the shape convention of an MLP