Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Diffusion models still have plenty of room to grow (vision/video is orders of magnitude more expensive and larger) and we're only beginning to experiment with agent (AI -> AI Agent) workflow communication and automation.


sort by: page size:

They're indeed not diffusion models, though they are trained on animation data as well as specifically designed for it (the raster papers at least.) I'm very hopeful wrt diffusion, though I'm looking at it and it's far from straightforward.

One problem with diffusion and video is that diffusion training is data hungry and video data is big. A lot of approaches you see have some way to tackle this at their core.

But also, AI today is like 80s PCs in some sense: both clearly the way of the future and clumsy/goofy, especially when juxtaposed with the triumphalism you tend to hear all around


Things are indeed moving really fast now, and that's really exciting, but I'd be cautious about extrapolating that pace too far into the future. Nearly all of the recent image generating AIs are all based on a single breakthrough technique called image diffusion[1]. Once we start hitting the limits of diffusion models progress will likely start to plateau a bit, at least until the next major breakthrough.

[1]: https://www.assemblyai.com/blog/diffusion-models-for-machine...


We are not actually there yet. First, you still need some technical understanding and a somewhat decent setup to run these models yourself without the guardrails. So the average greasy dude who wants to share HD porn based on your daugther's linkedin profile pic on nsfw subreddits still has too many hoops to jump through. Right now you can also still spot AI images pretty easily, if you know what to look for. Especially for previous stable diffusion models. But all of this could change very soon.

Love the focus on these AI workflows. Any plans to support other AI models like Dalle / Stable Diffusion?

yes stable diffusion is not great about handling multiple ideas. New image models coming out soon though.

To your 3rd point, most diffusion models already use a transformer-based architecture (U-Net with self attention and cross attention, Vision Transformer, Diffusion Transformer, etc.).

Some neat results from the last six months or so:

- Significantly-improved diffusion models (DALL-E 2, Midjourney, Stable Diffusion, etc)

- Diffusion models for video (see https://video-diffusion.github.io/, this paper is from April but I expect to see a lot more published research in this area soon)

- OpenAI Minecraft w/VPT (first model with non-zero success rate at mining diamonds in <20min)

- AlphaCode (from February, reasonably high success rate on solving competitive programming problems)

- Improved realism and scale for NeRFs (see https://dellaert.github.io/NeRF22/ for some cool examples from this year’s CVPR)

- Better sample efficiency for RL models (see https://arxiv.org/abs/2208.07860 for a recent real-world example)


You can also defer this skill set to an AI like stable-diffusion or DALLE2 these days.

The thing is most models for Stable Diffusion aren't created by companies but rather by end-users. There are literally hundreds of models for stable diffusion that you can download, from landscapes, to animals, to (of course) porn. A few of them were created by Stability or Huggingface, but most are trained by end-users. It isn't hard at all to train a model with existing tools -- you don't have to be an AI expert to do it.

Can they now? I had the impression that stabled diffusion is not moving forward that fast because the original team that built it is not the ones that are in stability.ai

Biggest problem for diffusion models were performance (as you need to iterate even at inference) But I'm not up to date with newest architectures maybe its already solved :P

There are public static diffusion models trained on/like midjourney, but IMO they're still lacking.

I guess the next step is to make stable diffusion better a lot of local context driven and multi- image prediction .

Today still the AI misses local context as the images are more of trained from Open images annotated in English by experts. But imagine if you have single image annotated by multiple people in different languages then what happens to AI capabilities


ChatGPT/Stable Diffusion are like the Newcomen engine of our time. It took 100 years for the world to see what the Newcomen engine promised, but I think we'll see it in 30. AI is going to totally reshape content creation - what it means to be a creator is going to be completely different for the next generation.

The Stable Diffusion models are very small though. You can probably train one with relatively low investment, e.g. 4x3090 under $20k.

I don’t disagree about the downside of DALL-E being locked away and expensive. It’s been exciting to see the Cambrian explosion of improvement to stable diffusion since its initial release. This is how AI research should be done and it’s sad that “Open AI” is not actually open.

That being said, for a business use cases, where I want to give it a simple prompt and have a high chance of getting a good usable result, it’s not clear to me that stable diffusion is there yet. Many of the most exciting SD community results seem to be in anime and porn, which can be a bit hard to follow. I guess the use cases that I’m excited about are things like logo generators, blog post image generators, product image thumbnail generators for e-commerce, industrial design, etc.

But please prove me wrong! I’m excited for SD to be the state of the art, it’s definitely better in the long term that’s it’s so accessible. I‘m sure a good guide or blog post about what’s new in stable diffusion outside of anime generation would be an interesting read.


Stable Diffusion[0] is an extremely expensive SOTA text-to-image diffusion model developed by a private nonprofit and trained on a massive "Internet tarball" that's about to have its weights shared open-source (the code and dataset are already open source). Not trying to invalidate your argument, just presenting a pleasant counterexample. I don't think I quite agree with your opinion that AI will remain undemocratized.

[0]https://github.com/CompVis/stable-diffusion


There's a lot of Stable Diffusion models though, I would say the opposite, it's hard to compete for Midjourney when you have 20 different competing fined-tuned Stable Diffusion models depending on what you want along with ControlNet, automatic1111 & comfyui.

Expand please. I think the initial hype is gone and we see that most implementations are not that much of a game changer just yet. Stable Diffusion can create images but its still incredibly cumbersome to find the right seed and prompt to get the image one desires. ControlNet, Inpainting, and Loras are helpful but have to be implemented in a useful workflow.
next

Legal | privacy