But it means the models were trained on images that are under copyright. In fact many of these models were trained exclusively on such images without any permission. For example Midjourney is clearly trained on everything on artstation.com where almost all images have commercial purpose / licenses.
If their legal assumption is it's not a copyright violation to train a model on some image, then it's logical that their ToS doesn't mention it, as they need the user's permission only for the scenarios where the law says that they do.
> Most (if not all) other images it used will be cc, unlicensed (most images on the internet) or copyrighted but unidentifiable
You keep saying "most". I get that you believe copyright infringement is orthogonal to your point, but it's really not.
If you can argue your point by saying "all, 100%" free and clear - public domain or otherwise permitted by the owners - then I can consider your point.
Otherwise, no, the model does not produce copyleft or public domain work. It's copyright infringement and IP theft.
"Unlicensed" does not mean it is not under copyright, by the way. CC usually means there are restrictions or requirements for use. Attribution, for instance.
But if the model were trained only on public domain work, I'm still not completely convinced. Perhaps. I do find Jason M. Allen's argument more compelling than Tamara Pester's
> As the images are generated by an AI, they are non-copyrightable and are therefore public domain.
I find this claim on the "about page" quite interesting. Some of those images might be so close to the training data that the copyright protection for fictional characters becomes relevant, even if the image is not identical. This is visible in this topic as people recognize characters from popular-culture (video-games or movies), because the training data seems to also contain fanart.
Probably not? Models are trained on restrictively licensed things all the time, such as images that are still in copyright. This is generally believed to be fair use, though I think this has not been tested in court?
I definitely understand the model does not contain the images.
You gave a very valid description for probably the most critical copyright issue around it at the moment.
I'd say my point is more around the fact that the content will have different legal standing depending on whether or not something produced by the model is considered a traded good or a private creative work using some tools.
Copyright and likeness rights is only one area this impacts, but is a solid example. If the content is considered traded, then likeness is not allowed. If it is a private work created through use of a tool, then it is fine as long as it isn't traded or publicly displayed.
My argument is that since the model is heavily influenced by its training, and the user did not train the model, that a trade is happening. The untrained model is the tool, but a trained model is now like a commodity and everything it produces is also a traded item.
Noone is contesting the fact that images where copyright is owned by Getty were used the model.
The contested issue is whether training a model requires permission from the copyright holder, because for most ways of using a copyrighted work - all uses except those where copyright law explicitly asserts that copyright holders have exclusive rights - no permission is needed.
Yes I can see that now but that seems too broad. Hypothetically if a model's trained on a thousand images only which 10 images are copyright. Does that mean of all images generated violate copyright law...?
It's not simply a given that using copyright material to train a model is copyright violation.
In my view it isn't. No one image contributes a significant amount, and the process the machine is doing it analogous to that a human does when the human learns.
It’s not obvious to me that using a copyrighted image to train a model is copyright infringement. It’s certainly not copyright infringement when used to train a human who may end up creating works that are influenced by (but not copies of) the original works.
Now, if the original copyrighted work can be extracted or reproduced from the model, that’s obviously copyright infringement.
> but you cannot publish those without their permission.
I wonder if someone could argue that using their photo as training data could make the model itself a derivative work? iirc those require royalty/license too.
We don't have to even get to the point of extending copyright to artistic style. If using copyrighted art to train a model is found to not be fair use, the technology will no longer be above board. Companies in countries with these rules won't be able to ignore them.
What could happen if this interpretation of fair use comes to pass? Model trainers may have to license images from companies like Getty. In a sense, it should be cheaper than using an image individually. In some countries, music organizations deal with licensing large sets of songs, no need to negotiate with each label or artist individually. That sort of arrangement could come to pass. Perhaps there'll be a simple option on image upload sites to select the license for the image - allow or disallow model training. Or perhaps everyone will simply use below-the-board models surreptitiously.
You mean like how the model itself is a derivative work of tons of copyrighted content? If the original model can sidestep the issue of being trained on copyrighted content, then it should be fair game to train a new model off of a copyrighted model.
No, it just means that if they sue you, they're pre-committing to not try and foreclose on your own generated outputs by claiming they're derivatives that they would then own.
Of course, this is a water sandwich. If model outputs are derivatives of the model, it'd be difficult to argue that the model itself isn't a derivative of all the training data, most of which isn't licensed. So if anything, this covers Stability's ass, not yours. There's also the related question of if AI models - not their outputs, just the models themselves - have any copyright at all. The logic behind the non-copyrightability of AI art would also apply to the AI training process, so the only way you could get copyright would be a particularly creative way of organizing and compiling the training dataset.
Remember: while the "AI art isn't art" argument reeks to high heavens of artistic snobbery, it's not entirely wrong. There isn't a lot of creative control in the process. Furthermore, we don't give copyright to monkeys[2], so why should we give it to AI models?
"Noncommercial" isn't actually a thing in copyright law. Copyrighted works are inherently commercial artifacts[0], so if you just say "noncommercial use is fine", you've said nothing - and you've invited the legal equivalent of nasal goblins[1] into the courtroom. Creative Commons gets around this by defining their own concept of NonCommercial use. So what did Stability's lawyers cook up?
> “Non-Commercial Uses” means exercising any of the rights granted herein for the purpose of research or non-commercial purposes. Non-Commercial Uses does not include any production use of the Software Products or any Derivative Works.
Uh... yeah. That's replacing a meaningless phrase with a tautology. Fun. The only concrete grant of rights is research use, and they categorically reject "any production use", which is awfully close to all uses. Even using this to generate funny fanart mashups for your own personal enjoyment could be construed as a 'production use'. Stability could actually sue you for that (however unlikely that would be).
[0] In the eyes of the law. I actually hate this opinion, but it's the opinion the law takes.
[1] Under ISO 9899, it is entirely legal for C programs with undefined behavior to make goblins fly out of your nose.
reply