ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Paper • 1910.02054 • Published Oct 4, 2019 • 10