Stable Diffusion has a smaller text encoder than Dalle 2 and other models (Imagen, Parti, Craiyon) so that it can fit into consumer GPUs. I believe StabilityAI will train models based on a larger text encoder, the text encoder is frozen and does not require training, so scaling the text encoder is quite free.
For now this is the biggest bottleneck with Stable Diffusion, the generator is really good and the image quality alone is incredible (managing to outperform Dalle 2 most of the time).
reply