Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

You really seem to be ignoring the core issue by focusing on SO though. Everything on SO is fair game, but code on GitHub is under a variety of licenses, and when Copilot regurgitates it, no matter how complex and inscrutable the process is that leads it to do so, it may be causing the user of Copilot to misuse that code because it doesn't even give them the opportunity to know where it came from or what license it was released to the public under.


view as:

Again, how does that differ from Stack Overflow? Do you go and check whether a given reply belongs to a licensed project?

Also, please consider that there is a toggle that allows you to block Copilot from using public code.


> Do you go and check whether a given reply belongs to a licensed project?

All SO questions, answers and comments are CC BY-SA. The terms of the site say that anyone submitting this content agrees that it's licensed that way, and when you visit the site you agree that you are provided with the content under that license. It's not necessary for you to check whether the submitter had the right to offer it under that license; that's their problem. The same goes for any content offered to you under a given license on any platform. I don't understand what your question has to do with the conversation.

The problem with Copilot, and I really can't believe this has to be restated over and over again, is that it takes code from projects with various licenses, and outputs it in your editor in various transformed-or-not-transformed ways (the fact that the transformation is extremely complex doesn't change anything), and gives you no way to know where the code came from, how it was licensed or how it has been transformed. So, despite the fact that if you use it enough you are virtually guaranteed to use code in contravention of its license, you cannot even know which projects you have stolen code from or which licenses' terms you are breaking.

> Also, please consider that there is a toggle that allows you to block Copilot from using public code.

Great. I'm sure its utility doesn't go down at all if you turn that toggle off...


> All SO questions, answers and comments are CC BY-SA. The terms of the site say that anyone submitting this content agrees that it's licensed that way, and when you visit the site you agree that you are provided with the content under that license.

Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? I feel that you are overly focused on the legal part here, which I'm sure was handled by Microsoft's lawyers. I'm more interested in the question of principle.

No matter what the terms of use at SO say, anyone can give you an answer that is a copy of some code they don't own. You may consider that immoral, but I don't, not at the scope SO is used for. In addition, the vast majority of cases at SO and Copilot are not about complex functions being stolen, it's about some dumb code you would have found in 2 minutes of googling. What I'm trying to argue here is that if we are all cool with SO and think it's useful, there is no fundamental difference here. We never cared too much about licenses for boilerplate code, and I think we shouldn't start now.


> Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? I feel that you are overly focused on the legal part here, which I'm sure was handled by Microsoft's lawyers. I'm more interested in the question of principle.

I have, and there is not. Neither could there be — in many cases the person uploading code to GitHub is not the copyright holder — they are just doing something permitted under the license — and for a large open source project there could be thousands of copyright holders. A random person mirroring some source code to GitHub is in no position to negotiate different license terms on behalf of the copyright holder(s).

> No matter what the terms of use at SO say, anyone can give you an answer that is a copy of some code they don't own. You may consider that immoral, but I don't, not at the scope SO is used for. In addition, the vast majority of cases at SO and Copilot are not about complex functions being stolen, it's about some dumb code you would have found in 2 minutes of googling. What I'm trying to argue here is that if we are all cool with SO and think it's useful, there is no fundamental difference here. We never cared too much about licenses for boilerplate code, and I think we shouldn't start now.

I don't understand why you think a person writing an answer on SO and a computer program outputting some permutation of its inputs into your editor are the same thing. The person writing an SO answer is intelligent and capable of conceptual understanding, the computer regurgitating code without regard to its license is not.


>> Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? > I have, and there is not.

At least one IP lawyer strongly disagrees, suggesting anything you host on GitHub is fair game [1].

[1] https://fossa.com/blog/analyzing-legal-implications-github-c...

> The person writing an SO answer is intelligent and capable of conceptual understanding, the computer regurgitating code without regard to its license is not.

From a copyright perspective, that is irrelevant. In fact I would think Copilot has more incentives to not infringe than a random SO user, who is very unlikely to be sued. I already argued in another post that in my view, from any perspective, it is also irrelevant whether it's a person or AI doing the same work Copilot does.


> At least one IP lawyer strongly disagrees, suggesting anything you host on GitHub is fair game [1].

The question is whether Copilot's users can use the regurgitated code without following the license terms, not whether Copilot was allowed to train their model on it. I agree it's likely fine for them to train the model, but the use of Copilot would seem to be a legal minefield.

A little thought makes it clear that an affirmative answer would be absurd. This would mean that using a simple tool (let's say `cat`) to make a copy of some code and subsequently ignoring its license terms is infringement, but if the software used to make the copy is more complex (or perhaps if it has the "AI" label stuck to it!) the same actions are not infringement.


Legal | privacy