InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper
•
2309.06380
•
Published
•
32
InstaFlow-0.9B is a one-step text-to-image generative model fine-tuned from 2-Rectified Flow.
It is trained with text-conditioned reflow and distillation as described in our paper.
Rectified Flow has interesting theoretical properties. You may check this ICLR paper and this arXiv paper.
Please refer to the official github repo.
Training pipeline:
The final model is InstaFlow-0.9B.
Total Training Cost: It takes 199.2 A100 GPU days in total (data generation + reflow + distillation) to get InstaFlow-0.9B.
The following metrics of InstaFlow-0.9B are measured on MS COCO 2017 with 5,000 images and 1-step Euler solver:
FID-5k = 23.4, CLIP score = 0.304
Measured on MS COCO 2014 with 30,000 images and 1-step Euler solver:
FID-30k = 13.1
@article{liu2023insta,
title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
journal={arXiv preprint arXiv:2309.06380},
year={2023}
}