Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

If you read a book and use the instructions to build a bicycle you are learning a new skill and this is obviously not exploitation of people's work.

When you read a book and copy this book partially or entirely to create a new book or create a derivative work using this book without citation it's called plagiarism and copyright infringement. It is not only exploitation, it is against the law.

If you feed an entire library to an AI to generate new books without source citation and copyright agreements it is not only exploitation, it is against the law. We can call this automated plagiarism and copyright infringement, and automated or not, it is against the law. Except if you use public domain books. It wouldn't be illegal but highly unethical considering there are powerful companies with big pockets bending public domain's laws to avoid their assets to be public available (I'm looking at you Disney), but that is another story.



sort by: page size:

Is this not the same situation if someone were to plagiarise a public domain (i.e. no copyright) work now?

Plagiarism and copyright infringement have some overlap but they're distinct. You can be guilty of plagiarism by taking something in the public domain and presenting it as your own work.

Plagiarism is not a legal thing. Copyright infringement is, but that’s a different beast.

If I read something, "learn" it, and reproduce it word for word (or with trivial edits) even without referencing the original work at all, it is still copyright infringement.

Actually, plagiarism and copyright infringement are different things. For example, it is possible to plagiarize something that is not copyrighted, and many forms of copyright infringement wouldn't fit the definition of plagiarism.

Copying texts is potentially copyright infringement, though.

Plagiarism isn't a crime (care to point out the anti plagiarism law?) I don't really care what the ACM says about it. Copyright on the other hand is backed by law and therefore is a legal matter.

Oh please. First of all, how do you steal an idea? We’re talking about pictures. Supposing that you buy into the theory that you can, copyright was created to further the arts and sciences; it’s in the US constitution. The point isn’t to control your work — it’s to live in a richer society. And it’s not even clear that training a model counts as infringement. Being able to recite a quote from a book is different than reproducing the entire book. Artists won’t acknowledge that the same applies to their art.

If you believe that training models on art is stealing, then I’m a master ninja, since I’m the creator of books3. And even Stephen King today came out and said that he’s fine with it:

> Would I forbid the teaching (if that is the word) of my stories to computers? Not even if I could. I might as well be King Canute, forbidding the tide to come in. Or a Luddite trying to stop industrial progress by hammering a steam loom to pieces.

https://www.theatlantic.com/books/archive/2023/08/stephen-ki...

If he’s not worried, why are you?

I take a dim view of people trying to frame researchers as criminals. We’re not. We want to further science. That’s all.

You call me a grifter, but I’ve made roughly a hundred bucks from books3, and that’s because someone found my patron buried under a pile of links and subscribed to it many months ago. Most of my researcher colleagues seem to have similar distaste for wanting to make money. The work is the goal.


your argument is that it's not infringing because they copied everything at once?

I get that there's case law on copying in memory on the input side not being infringing but can't for the life of me understand how they get away with not paying for it. At least libraries buy the books before loaning them out, OpenAI and midjourney presumably pirated the works or otherwise ignored the license of published works and just say "if we found it on the internet it's fair game"


Because we're talking about plagiarism, and because this is so often a point of confusion in internet debates, I want to just point out that plagiarism and copyright are almost unrelated things.

* plagiarism is not generally against the law, although it is a violation of school policies and can get you punished by your school. It's claiming someone else's work as your own. It may or may not involve a copyright violation, it can be plagiarism without involving a copyright violation -- it could be the author gave you permission, or the item isn't in copyright, or it would count as "fair use" -- it's still plagiarism.

* copyright is about the law, violating a copyright is against the law and can get you civil or criminal penalties. It involves copying work someone else legally owns without permission. It may or may not involve claiming the work as your own, for the most part whether you attribute something properly or claim it for your own is not relevant to whether it is a copyright violation. (I suppose in some edge cases it could be relevant to whether you have a "fair use" defense, but mostly it's not significant in whether something is a copyright violation).


Please don’t use plagiarism as an argument for keeping copyright. They are not related. Plagiarism is already illegal or disallowed for obvious reasons, but copyright is a different matter.

That’s precisely the funny thing with copyright:

If you create a work where you can clearly tell what the source was for your inspiration because you stole from another source, it’s a violation of copyright. But if you create a work and you can not tell what the source is of your inspiration, because you stole from so many different sources, not only is not a violation of copyright, but it’s actually the creation of a new copyrighted work in its own right.

ML is short-circuiting this legal framework. Because now stealing from thousands of different authors, in a way that it’s no longer possible to tell the sources can now be done with the press of a button.


The definition of plagiarism isn't connected to laws, so yes, it is. Intentionally taking someone else's work and passing it off as your own is bad juju through and through.

And even if I kept my code under copyright with an attribution license, not all forms of plagiarism would necessarily violate the copyright. For example, ideas and facts generally can't be copyrighted, but one could imagine cases where they can be plagiarized.

So even if I did use a license that requires attribution, that only works insomuch as plagiarism is a violation of copyright. And as we all know, there is a ton of grey area there. But regardless of the grey area, the plagiarism itself is still unethical.

Some interesting bits are here: https://www.plagiarism.org/blog/2017/10/27/is-plagiarism-ill...


Gotcha.

I wasn't talking about someone creating and selling copies of someone else's work, fortunately.

So my point stands and your completely is in agreement with me that people are allowed to learn from other people's works. If someone wants to learn from someone else's work, that is completely legal no matter the licensing terms.

Instead, it is only distributing copies that is not allowed.


A work created this way is derivative. If it was through human mind then could be called plagiarism or tribute, but through an automatic tool ran for profit it’s a clear violation if the work was not licensed to allow derivative works.

And of course there is human intent, what are you even talking about? This is law. Law is sort of centered around human actions and intent plays a big role. In this case, operators fully intended to scrape copyrighted works, feed it to this tool and operate it for profit (because money smells good).


Yeah this is basically equivalent to monetizing yourself reading a book someone else wrote.

We can argue that copyright shouldn't exist but it seems pretty non-controversial that this is a violation of copyright.


I think the OP is about copyright infringement and plagiarism, which is a form of fraud. I do not justify it but it is not theft.

I don't think the world has much to benefit from plagiarism, but I think the benefits of copying others' work and building upon them freely far outweigh the downsides.


A human copying your copyrighted material breaks the law. Doesn't mean they can't do it, but at least it's illegal.

a few snippets of a book also isn't a product, and yet it can absolutely be infringing.
next

Legal | privacy