Hacker Read

cornel_io | karma 883 | avg karma 3.2 · 2023-04-03 02:49:18

Whether or not copyright applies at all to model training is an entirely open question, and where rulings have come down, it's likely closer to these situations being fair use (e.g. the Google Book's case, which was ruled transformative and not a direct replacement for the works in question).

The reality is, these models don't copy or distribute anything directly, which makes applying copyright a bit of a stretch. Many people feel like it is a use that should have some sort of IP law applying to it, which is why I think there's some chance that courts or legislators will decide to screw the letter of existing law and just wedge new interpretations in, but it's not super simple: they'd have to thread the needle and not make things like search illegal, and that's tricky. Besides that, these models are out there, they're useful, and if they're ruled infringing they'll just be distributed illegally anyways.

I don't envy the people who will have to decide these cases, I suspect what's better for the world overall is to leave the law as-is and clarify that fair use holds (nobody will stop publishing content or code just because AI is slurping it up, a few weirdos like the article author excepted), but there are going to be a lot of pissed off people either way...

reply

bombolo | karma 934 | avg karma 0.72 · 2023-04-03 05:15:01

Would I be able to train an AI only using microsoft's leaked windows code to write a windows clone with no copyright (since it comes from an AI) and be safe from legal repercussions because it was trained on fair use code I just happened to find online?

If they rule that it's ok to do that, I might be ok with AI being ruled as fair use.

reply

sokoloff | karma 41892 | avg karma 2.58 · 2023-04-03 05:47:00

Is it even fair use under the law to consume that Windows source code?

bombolo | karma 934 | avg karma 0.72 · 2023-04-03 11:36:18

People who train AI models claim that anything they can find on the internet is fair use.

It's the whole point of the discussion… is it really?

And if it's not fair use to train on windows source code because of copyright… doesn't that same copyright law cover everything else as well?

reply

sokoloff | karma 41892 | avg karma 2.58 · 2023-04-03 11:45:09

I think there's a reasonable distinction to make between "you can train AI models on any code that you are legally allowed to have and read" and "you can train AI models on any code that you are able to feed into it, regardless of whether you have permission to possess/read it".

bombolo | karma 934 | avg karma 0.72 · 2023-04-03 13:50:33

You're not legally allowed to have and read it if you knowingly violate its license terms.

FrustratedMonky | karma 1880 | avg karma 0.86 · 2023-04-03 08:38:06

Exactly. I'm betting if you asked GPT to create a windows clone, for sure MS would not let you distribute. This will go like every other law/license, big corp can sue little guy into the ground. When big corp uses your code it will be 'thats just a model generated code not yours'. But in other direction, if little guys creates windows clone, 'sorry, its off to jail for you maytee'.

jgerrish | karma 291 | avg karma 1.88 · 2023-04-03 12:38:38

Even if it's the opposite direction, big guy losing and small guy coming out ahead, it's still drama.

Just like Covid introduced epidemiological terms to the general public, this issue can introduce design choices around licensing, copyright and watermarking to more people.

I assume there is a group of researchers building tools to provide fine-grained historical views into AI output. And yes, for billions of parameters trained on billions of documents, linking every letter to a source document is a UX nightmare.

But what a cool problem. That's the interesting part. Yeah, something like TileBars[1] or Seesoft[1] seems like the right tool. But maybe keeping it all text with some graphical marker of authenticity is the better choice.

So many cool problems. But, that authenticity marker is the hard sell. Can reasoned discussions with others be enough to introduce that, or is drama required?

https://people.ischool.berkeley.edu/~hearst/irbook/10/node7....

reply