Training purely on Disney's images would probably be difficult, considering the huge amount of images you need to train a model from scratch. But here is a "fine-tuned Stable Diffusion model trained on screenshots from a popular animation studio". Seems to have worked out quite well so far as that model was last updated last year.
There you go. Try and generate a image of Mickey Mouse with his mouth closed using Stable Diffusion and you will see a verbatim image of him copied straight from Disney.
This opens up ideas. One thing people have tried to do with stable diffusion is create animations. Of course, they all come out pretty janky and gross, you can't get the animation smooth.
But what if what if a model was trained not on single images, but animated sequential frames, in sets, laid out on a single visual plane. So a panel might show a short sequence of a disney princess expressing a particular emotion as 16 individual frames collected as a single image. One might then be able to generate a clean animated sequence of a previously unimagined disney princess expressing any emotion the model has been trained on. Of course, with big enough models one could (if they can get it working) produced text prompted animations across a wide variety of subjects and styles.
I know the most about stable diffusion. Stable diffusion (based) models are typically open source, you can download them for free, and they easily fit on a cheap USB stick.
They're not actually supposed to have anyone else's pictures in them as such [1].
As far as I can tell, the training is more along the lines of "this is what a dog looks like", "this is what a cat looks like", "this is what a stick looks like" ... etc.
At this point in time, unaltered output is automatically public domain, because it is not made by a human.
I've had a blast playing with stable diffusion and I see all the potential it will bring to us. I released a service for training your model, just upload 20-30 images and you can have a model of someone or some object doing anything. You can train one model for free a month in a slower queue or you can train many models on a fast queue and with other features for a fee. Here is an example of using it for show product placement: https://app.88stacks.com/image/O6kReClOvrz7 and here is an example of using it for people: https://app.88stacks.com/image/nOpdvCwx6kb7
and an example for using it for styles: https://app.88stacks.com/image/zyjw6CmEgk2d
The UI is rough, but I would love feedback on how to improve it for you. https://88stacks.com
They're indeed not diffusion models, though they are trained on animation data as well as specifically designed for it (the raster papers at least.) I'm very hopeful wrt diffusion, though I'm looking at it and it's far from straightforward.
One problem with diffusion and video is that diffusion training is data hungry and video data is big. A lot of approaches you see have some way to tackle this at their core.
But also, AI today is like 80s PCs in some sense: both clearly the way of the future and clumsy/goofy, especially when juxtaposed with the triumphalism you tend to hear all around
I tried Ghibli diffusion to get some decent cartoon styled results with SD, but it's really bad. Results were all hallucinations from the training data, no interesting new combinations, no novel prompts that would work. I stopped after a few dozens.
I'd still like to get some good cartoony results with SD, but it's clearly not possible yet.
Stable Diffusion 1.x isn't original work either; it uses OpenAI CLIP.
But training your own is pretty doable if you have the budget and enough image/text pairs. Most people don't have the budget, but at least Midjourney and Google have their own models.
If you want to add new stuff to Stable Diffusion - you don't have to retrain the main model from scratch.
You can train it only on a few hundred images at a time and then add the resulting model as an extension or merge it into the main model.
People train their models on a single celebrity or a single artist that way.
Other types of AI could be trained in a similar way.
StableDiffusion is 4GB which is approximately two bytes per training image. That's not very derivative, it's actual generalization.
"Mickey" does work as a prompt, but if they took that word out of the text encoder he'd still be there in the latent space, and it's not hard to find a way to construct him out of a few circles and a pair of red shorts.
I wonder if you could run it through Stable Diffusion as a next project idea? Train a model on your images using Dreambooth and then clean the video to get HD results?
It's not clear what you mean by "based on". For example, the model Anything is trained on the Danboru anime image site. These images aren't in the standard Stable Diffusion model. The issue with that model is with the legality of including those images which the standard Stable Diffusion model does not.
The thing is most models for Stable Diffusion aren't created by companies but rather by end-users. There are literally hundreds of models for stable diffusion that you can download, from landscapes, to animals, to (of course) porn. A few of them were created by Stability or Huggingface, but most are trained by end-users. It isn't hard at all to train a model with existing tools -- you don't have to be an AI expert to do it.
Obviously perf and efficiency would be terrible, but could you distribute the training of a diffusion model across volunteers' computers? I would think that, for many fangroups, you'd have a lot of folks who would like to customize a diffusion model for their favorite characters and/or art.
Check out this video by Corridor Crew -> they're able to use Stable Diffusion to consistently transfer the style of an animated film (Spiderverse) onto real world shots.
https://civitai.com/models/24/modern-disney
reply