Stable Diffusion is an image generation model that's been released to the public at large. If you have a decent GPU, you can run the model yourself. (Even without a decent GPU technically you can still do it, though it's much slower)
What sort of setup do you need to be able to fine tune Stable Diffusion models? Are there good tutorials out there for fine tuning with cloud or non-cloud GPUs?
Many old consumer gaming GPUs will run an implementation of Stable Diffusion. But this page seems to be about getting use of H100 and A100, such as one might want for running or training decent-sized LLMs.
It might require dedicated hardware. That only really becomes possible when you've proven the idea, but ASICs for cryptomining, TensorFlow, etc are quite real. There's no reason why dedicated hardware for training Stable Diffusion couldn't happen.
A lot of HN has been having fun with stable diffusion. Do we really need 1 x GPU with 10GB of RAM? How do you distribute or "shard" a model you're training? Could we get this running on the raspberry pi clusters we all have? Hook it up to OpenFaaS too.
You can run stable diffusion on a MBP and produce images in under a minute. It's training these models that takes the crazy GPU power - running them is quite reasonable.
The training with a beefy GPU from vast.ai (RTX 3090 with 24vram) and Im generating the images with a GTX 1080 with 4vram, so no need for 6 or even 10 GVram from my testing
Stable Diffusion has a smaller text encoder than Dalle 2 and other models (Imagen, Parti, Craiyon) so that it can fit into consumer GPUs. I believe StabilityAI will train models based on a larger text encoder, the text encoder is frozen and does not require training, so scaling the text encoder is quite free.
For now this is the biggest bottleneck with Stable Diffusion, the generator is really good and the image quality alone is incredible (managing to outperform Dalle 2 most of the time).
I agree, I've definitely seen way more information about running image synthesis models like Stable Diffusion locally than I have LLMs. It's counterintuitive to me that Stable Diffusion takes less RAM than an LLM, especially considering it still needs the word vectors. Goes to show I know nothing.
I guess it comes down to the requirement of a very high end (or multiple) GPU that makes it impractical for most vs just running it in Colab or something.
reply