Hacker Read

ly3xqhl8g9 · 2023-07-11 01:32:47

From the post:

    [Rumors that start to become lawsuits]

    Some speculations are:
    - LibGen (4M+ books)
    - Sci-Hub (80M+ papers)
    - All of GitHub

This is the most funny, but in the end sad aspect. If ChatGPT was indeed trained on pirated content and is able to be(come) such a powerful tool, then the copyright laws should have been abolished yesterday. If ChatGPT was not trained on all these resources out there, then think how much powerful a tool it would be if it were trained, then copyright laws are actively stifling advancement and should have been abolished yesterday.

nullindividual | karma 1961 | avg karma 1.71 · | 2023-11-30 09:26:32

ChatGPT is outright ripping straight from copyrighted works.

https://www.404media.co/google-researchers-attack-convinces-...

reply

mjburgess | karma 10238 | avg karma 3.41 · | 2023-06-01 04:11:25

I now routinely introduce this technology as "copyright laundering" and the hype put out by start-up boards and VCs as a ploy to disguise this fact. The "AI threat" is smoke-and-mirrors to dress up what's happening.

I derive a huge amount of value from chatgpt because I can copy/paste without any IP impact. I could always have done this: from github, from ebooks, from many sources.

Now I can benefits from the labour of many for free -- their copyrights laundered through a thin statistical trick.

As with crypto (, pyramid schemes, etc.) the big "philosophical pitch" becomes a disguise for a brutal material reality.

Midjourney, ChatGPT, etc. are doing automatically what would be illegal by-hand.

reply

svaha1728 | karma 323 | avg karma 2.07 · | 2023-05-23 19:02:52

An argument could also be made that If you pulled all the copyright infringements out of the training set for ChatGPT it would be not be half as intelligent.

jjeaff | karma 13640 | avg karma 2.31 · | 2022-12-21 23:34:09

One avenue that chatGPT has, and I'm not sure if it is being utilized at all yet, would be the ability to feed it the unimaginably huge body of information locked behind copyrighted textbooks, books, academic papers and other pay walled information.

Imagine the knowledge that could be accessed by feeding all that information into an ai engine like chat gpt. Presumably, it would not break copyright rules anymore than a regular human reading a bunch of papers behind a paywall and regurgitating the learned information.

reply

mcast | karma 607 | avg karma 3.04 · | 2023-11-17 14:00:36

There was a rumor about ChatGPT using libgen for training data, if true, I find it hard to believe Google's legal team would touch that with a long stick.

ksaj | karma 2486 | avg karma 0.81 · | 2023-03-17 01:00:48

The irony here is that ChatGPT can operate as a fairly good copyrighter. But since it is detectable, there are now human copyrighter services like this one to humanize the output.

MuffinFlavored | karma 4217 | avg karma 1.27 · | 2022-12-08 20:45:09

> Pulling the meat of the content from a site like StackOverflow ends up as a copyright/anti-trust violation.

Then how did ChatGPT do it?

reply

bonzaidrinkingb | karma 21 | avg karma 0.84 · | 2023-11-29 12:04:35

I suspect ChatGPT is using a form of clean-room design to keep copyrighted material out of the training set of deployed models.

One model is trained on copyrighted works in a jurisdiction where this is allowed and outputs "transformative" summaries of book chapters. This serves as training data for the deployed model.

reply

ivalm | karma 3236 | avg karma 2.4 · | 2023-07-15 19:29:27

Your sentiment is exactly what I intended, albeit I was terse and a little facetious. ChatGPT is like introducing a bunch of new skilled labor, it’s just for the first time this skilled labor isn’t human. The fact that this skilled labor learned from copyrighted material is like saying human labor learned from copyrighted material.

cowsup | karma 910 | avg karma 6.07 · | 2023-12-27 08:16:51

Inevitable outcome. Since ChatGPT launched, nobody has a clue as to what is legal and what is illegal with these chat-based LLMs.

Is the content that LLMs produce enough to rise to the level of copyright infringement? Is the fact that a company trained their LLM on your data, with the knowledge it would be used for outputs (=profit), enough that all of their outputs should be considered, to at least a minuscule degree, influenced by your work? How would ChatGPT's "training" differ from, say, another journalist who reads the NYT, and subconsciously uses that to help provide better services?

None of us can answer these questions definitively. The courts hearing these sorts of arguments were a foregone conclusion. I think a lot of the large LLMs (certainly OpenAI competitors) are going to breathe a sigh of relief that this is happening sooner rather than later, so they know where the legal lines are to be drawn.

reply

Al-Khwarizmi | karma 6176 | avg karma 4.42 · | 2023-12-16 01:57:54

I still don't understand how they can keep a straight face claiming that training on all human-written material (copyrighted or not) that can be found on the Internet is perfectly fine, but training on ChatGPT output is not (or in other words, that human writers cannot have a choice on whether their output is used, but bot owners can).

lmkg | karma 10906 | avg karma 6.32 · | 2023-06-25 15:06:55

Here's my completely-unsourced conspiracy theory:

Someone is using ChatGTP to look for infringing content. DMCA notices are being sent on the basis of AI hallucinations.

reply

NicoleJO | karma 72 | avg karma 0.8 · | 2023-02-01 17:59:48

ChatGPT Stole Your Work. https://www.wired.com/story/chatgpt-generative-artificial-in...

B1FF_PSUVM | karma 1145 | avg karma 0.67 · | 2023-05-27 15:46:55

> ChatGPT, as it often does, hallucinated wildly

Plausible bullshit generation for free, as if there's not enough already available cheap.

reply

kelipso | karma 902 | avg karma 1.15 · | 2023-06-08 19:06:07

> chatbot known to hallucinate

This is known to maybe half the people in the tech world.

> programmed purely for predictive text

Way fewer people know this.

Most people think of it as some magical AI or whatever. Even with huge disclaimers about it hallucinating, there are so many APIs into ChatGPT...I can only imagine this getting worse in terms of lawsuits.

reply

NicoleJO | karma 72 | avg karma 0.8 · | 2023-01-30 20:38:46

ChatGPT stole from everyone, dude. So just stop.

https://www.wired.com/story/chatgpt-generative-artificial-in...

reply

sublinear | karma 2988 | avg karma 2.17 · | 2023-02-14 17:53:47

> It's utility seems like it will steamroll any attempts to stop or slow it down.

What? I don't see any utility outside of education and even there it's pretty sketchy.

For business, legal compliance is not a joke and instantly shuts it down. The only businesses willing to use ChatGPT for generating code would be naive young startups who don't realize some assembly is still required and the instructions are missing no matter how much they query the bot. That's called expertise (which they don't yet have). It's not good enough to just write the code. Someone has to comprehend it so they can tweak it as needed. At some point the tweaks will become unwieldy and require actual software engineering that the bot doesn't know how to do (transform from one design pattern to another and know which to use). More power to them if they can cobble something together and then succeed at maintaining it. By the time they're through they'll have pulled off so many miracles that they won't need the bot anymore and become experts. That's quite the trial by fire, but hey everyone has to find their way!

reply

kurthr | karma 8308 | avg karma 3.17 · | 2023-01-14 12:11:48

Well it is a machine so of course it doesn't care:

   ChatGPT is a bullshitter in the (Frankfurt) sense of having no concern for the truth at all

Detecting true plagiarism with it (or a derived entity as a service) would be as useful the currently proposed watermarking. Turn the technology to some advantage, because the profit seekers and free riders certainly won't be deterred.

kuchenbecker | karma 378 | avg karma 1.47 · | 2023-12-27 16:32:43

While I 100% agree, there is another angle to consider this from, in that ChatGPT replaces reading the NYT. ChatGPT competes with it in the delivery of information.

To add to your point though, a sufficiently advanced AI trained on licensed data could reproduce copywrited content from prompt alone. It's the next step that would cause infringement where someone does something withcthe output.

reply