> The challenge with these models is that they’ve clearly been trained on (exposed to) copyrighted material, and can also demonstrably reproduce elements of copyrighted works on demand. If they were humans, a court could deem the outputs copyright infringement, perhaps invoking the subconscious copying doctrine (https://www.americanbar.org/groups/intellectual_property_law...).
Every single human has been exposed to copyrighted material, and probably can reproduce fragments of copyrighted material on demand. Nobody ever writes a book or paints a picture without reading a lot of books and looking at a lot of paintings first. For a "subconscious copying" suit to apply, you need to demonstrate "probative similarity" - that is, similarity to copyrighted material that is unlikely to be coincidental.
In other words - it's not clear to me that the situation with AI is any different than with a human, or that it presents new legal challenges. If it looks new, it is new.
> If they were humans, a court could deem the outputs copyright infringement
I'm not sure I understand how this is self-evident. The closest equivalent I can see would be a human who looks at many pieces of art to understand:
- What is art and what is just scribbles or splatter?
- What is good and what isn't?
- What different styles are possible?
Then the human goes and creates their own piece.
It turns out, the legal solution is to evaluate each piece individually rather than the process. And, within that, the court has settled on "if it looks like a duck and it quacks like a duck..." which is where the subconscious copying presumably comes in.
I don't know where courts will go. The new challenge is AI can generate "potentially infringing" work at a much higher rate than humans, but that's really about it. I'd be surprised if it gets treated materially different than human-created works.
> I think it's almost a guarantee that courts will start finding exact AI reproductions of copyrighted work to be infringement.
That was never not true. The difference is that AI can't violate copyright, only humans can. The legal not-so-gray area is whether "spat out by an AI after prompting" is a performance of the work and if so, what human is responsible for the copying.
> The new challenge is AI can generate "potentially infringing" work at a much higher rate than humans, but that's really about it
The other challenges are:
(i) the model isn't a human that can defend themself by explaining their creative process, it's a literal mathematical transformation of the inputs including the copyrighted work. (And I'm not sure "actually the human brain is just computation" defences offered by lawyers are ever likely to prevail in court, because if they do that opens much bigger cans of worms in virtually every legal field...)
(ii) the representatives of OpenAI Inc who do have to explain themselves are going to have to talk about their approach to licenses for use of the material (which in this case appears to have been to disregard them altogether). That could be a serious issue for them even if the court agrees with the general principle that diffusion models or GANs are not plagiarism.
And possibly also (iii) the AI has ridiculous failure modes like implementing the Getty watermark which makes the model look far more closely derived from its source data than it actually is
> That somehow international legislation will converge on the strictest possible interpretation of intellectual property, and those models will become illegal by the mere fact they were trained on copyrighted material.
That's the only possible interpretation, really. AI models algorithmically remix input intellectual property en masse, without any significant amount of human creativity, the only thing copyright law protects. As such, the models themselves are wholly derived works, essentially a compressed and compact representation of the artistic features of the original works.
Legally, a AI model is equivalent to a huge tar.gz of copyrighted thumbnails: very limited fair use applies, only in some countries, and only in certain use contexts that generally don't harm the original author or out-compete them in the market place - the polar opposite of what AI models are.
I see two separate issues, the one you describe which is maybe slightly more clear cut: if a person uses an AI trained on copyrighted works as a tool to create and publish their own works, they are responsible if those resulting works infringe.
The other question, which I think is more topical to this lawsuit, is whether the company that trains and publishes the model itself is infringing, given they're making available something that is able to reproduce near-verbatim copyrighted works, even if they themselves have not directly asked the model to reproduce them.
I certainly don't have the answers, but I also don't think that simplistic arguments that the cat is already out of the bag or that AIs are analogous to humans learning from books are especially helpful, so I think it's valid and useful for these kinds of questions to be given careful legal consideration.
> Can someone explain again how an ML system scanning and training on a copyrighted work is different from a highly skilled artist doing the same?
Because (fortunately) human thoughts can't be subject to copyright law yet. So when we talk about copying and making derivative works, if you have this
artistic works -> neural network weights
The end result may or may not be copyrightable (that's for the courts to decide), but this
> It's not as though the AI had a lover who left them and drew inspiration from that experience to become a more effective AI.
Well then your ex-lover clearly deserves co-songwriting credits. As well as their parents. And anybody who has influenced them personally, or even anyone who made the food that they've eaten. There's gotta be a point at which the original is just too far removed from the end result for it to be infringing, otherwise you could just keep going.
Also is a model that classifies things isn't the same as those things themselves, or images of those things, and would most I certainly hope it would be considered transformative enough to not be infringing (I am not a lawyer though). I could give it an image of a dog, and it will tell me what it thinks it is. But there doesn't seem to be any way for me to say "show me a dog" and get back any sort of image, infringing or not.
Some thought experiments: Let's say you have a copyrighted photo, and I design an API that allows anyone to upload a photo and get a true/false of whether or not it's the same file as your photo. Is this copyright infringement if I never release the original photo?
> the whole point is to compete with the underlying works
AI doesn't compete just compete with "the underlying works" though; it competes with all works, even works that weren't in its training set. So the type of "competition" you're describing is incidental, and not connected to the AI's use of any particular copyrighted work it was trained on.
I don't know whether that makes a difference under existing law, but I'd argue that it should.
I'd also argue copyright/fair use shouldn't even come into play, since AI models do not (typically) contain copies of the works they're trained on in the first place (except in cases of over fitting), but that's a completely separate issue.
> Therefore, the product of a generative AI model cannot be copyrighted
Is that last bit from the Copyright Office, or is it the author's interpretation? Because I could just as easily imagine the battle being over who or what was the actual creator of the content (i.e. is it a derivative work), rather than whether the thing the machine created is eligible for copyright.
To remove the AI from the equation for a second, imagine that I took four images of living artists' work, placed them in a 2x2 grid, and called that a new artwork. There are two seperate questions to consider: (1) have I infringed upon the original authors' copyrights, and (2) is the new thing I have created eligible for copyright.
The stance that there is "no copyright protection for works created by non-humans" only addresses the second question, not the first question of how it interacts with existing copyrights.
> artists and writers are getting their work stolen left and right
Please show me a few examples where a work was copied by an AI and the copy is so good it violates copyright law.
All I have seen is someone used a pirated dataset to train AIs. Suing over that is like suing Seagate because someone tested a prototype hard disk by storing pirated books on it.
> how do you know if the image that just generated is substantially similar to an existing copyright work?
This is already a problem with biological neural nets (i.e. humans). I remember as a teenager writing a simple song on the piano, and playing it for my mom; she said, "You didn't write that -- that's Gilligan's Island!" And indeed it was. If I had made a record and sold it, whoever owned the rights to the Gilligan's Island theme song could have sued me for it, and they would (rightly) have won.
There's already loads of case law about this; the same thing would apply to AI.
> what is stopping someone from generating millions of images and copy righting all the "unique" ones? Such that no one can create anything without accidental collisions.
Right now what's stopping it is that only humans can make copyrightable material; whatever is spat out from a computer is effectively public domain, not copyrighted.
> Therefore, using a training dataset does not constitute copyright violation.
It's not for you to decide that. Different jurisdictions will have their own process for deciding that and none of them are based on the opinions of random commentators on internet message boards.
Also please bare in mind my comment was reply to a specific statement (repeated below) and not talking about AI in general:
> You publish in public, you automatically grate licenses for the public to consume and transform it.
^ this statement is not correct for the reasons I posted. AI discussions might add colour to the debate but it doesn't alter the incorrectness of the above statement.
> If the AI outputted an exact copy (or a close enough copy, that the laymen would agree it's a copy), then that particular instance of the AI's output is in violation of copyright. The AI model itself violate any copyright.
That assumption needs testing in courts.
As I've posted elsewhere, there have been plenty of cases where copyright holders have successfully sued other creators based on new works that have bared a resemblance to existing works. It happens all the time. I remember reading a story about how a newly successful author was being handed ideas from fans during a book signing only for one of her representatives to intercept them each time. When they later asked why the representative took them, the representative said "it's because if any of your future books follow a similar idea, that fan could sue. But if we can prove you haven't read the idea then the fan has no claim". (to paraphrase)
Experts don't all agree on where the line is with similar works created by humans, let alone the implications of copyrighted content being used as training data for computers. And this is true for every jurisdiction I've researched. So to have random people on HN talk as confidently as they do about this being all perfectly legal is rather preposterous. You don't even fully grasp the intricacies of copyright law in your own jurisdiction, let alone the wider world. In fact this is such a blurred line that I wouldn't be surprised if the some cases would have different rulings in different courts within that same jurisdiction. It's definitely not as clear cut as you allude to.
> I don't think any person who actually has worked on anything creative in their life would compare a personal style to a model that can output in nearly any style at extreme speeds. And even if you're inspired by a specific author, invariably what happens is it becomes mix of yourself + those influences, not a damn near-copy.
I don't think anyone who has ever read a novel in their life would say that an AI can write literature at all, in any style.
> not a damn near-copy.
The obvious solution is to just treat it as if a human did it. If you did not know the authorship of the output and thought it was a human, would you still consider it copyright infringement? If yes, fair enough. If no, then i think is clearly not a "damn near-copy"
> What’s really happening is that AI models have much better memory than humans and are more precise in their output.
And yet, presumably we agree that a simple file server that serves up exact copies of copyrighted work does constitute copyright infringement. What's the difference? You could also say "what's really happening is that the file server has much better memory than humans." Duh!
It sounds like you're saying that, because an AI model is a very convoluted and sometimes inaccurate way to implement a computer system that sometimes serves up exact copies of copyrighted works, it's not copyright infringement when that computer system does serve up an exact copy of a copyrighted work. I'm not quite understanding the argument.
> If an artist recreates a copyrighted work or creates a derivative too close to the original, then that new work is potentially copyright infringement.
I see no reason the same standard cannot be applied to ML generated content. If the evaluation is being performed on the end result, then that is all that matters. The same judges that decide these things for human generated content can continue to do so for ML generated ones.
Even the people submitting and responding to the copyright claims will still be human (with briefs generated by ML…).
What will be more interesting is when the judges themselves get replaced with an “objective” AI to quantify similarity for copyright purposes. If that ever happens, it’ll trigger an arms race to hit the razors edge without going over.
Yeah...no. AI is doing nothing but reusing material. It generates the most likely image/text/code in its training set to be found following/around/correlating with the prompt. It literally has nothing outside it's training set to reproduce. And when it reproduces the Getty watermark, that's pretty obvious example of reusing copyrighted material.
>>It's learning from it in the same way humans do.
Not even close. These "AI" architectures may be sufficiently effective to produce useful output, but they are nothing like human intelligence. Not only is their architecture vastly different and making no attempt to reproduce/reimplement the neuron/synapse/neurotransmitter and sensory/brainstem/midbrain/cerebrum micro- and macro-architectures underlying human learning, the output both in the good and the errors is nothing resembling human learning. (source: just off-the-top-of-my-head recollections from neuroscience minor in college)
> I've posted this elsewhere, but it doesn't matter what our brains do and how similar that is to how AI operates. Humans have rights that machines do not. For example I can watch a movie and not be sued for infringement because I made a copy of the movie in my head.
No AI model works that way either though. Think Stability AI: It doesn't have a copy of every image that it was generated with, but has "distilled" the patterns out of images. It no longer has a copy of any specific image inside it, nor is there an ability to extract training data from it.
In which case, it does not have a copy of the movie in its head either - but it does, for example, recognize the Disney look.
Right now, GitHub Copilot's argument is that training AI models, on copyrighted material, is currently legal. This is also the position almost every AI startup also takes, and it is rooted in "fair use" taking the "transformative" qualities of a work into consideration. There is no doubt that AI-generated suggestions generally are "transformative," it is only about whether they are transformative enough.
> Of course, a court needs to decide this. But I can’t see how allowing an AI model to view a picture constitutes making an illegal copy.
Memory involves making a copy, and copies anywhere except in the human brain are within the scope of copyright (but may fall into exceptions like Fair Use.)
Every single human has been exposed to copyrighted material, and probably can reproduce fragments of copyrighted material on demand. Nobody ever writes a book or paints a picture without reading a lot of books and looking at a lot of paintings first. For a "subconscious copying" suit to apply, you need to demonstrate "probative similarity" - that is, similarity to copyrighted material that is unlikely to be coincidental.
In other words - it's not clear to me that the situation with AI is any different than with a human, or that it presents new legal challenges. If it looks new, it is new.
reply