Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It's even more likely that HN commenters prognosticating about the outcomes of litigation will be wrong. There is always a chance a forum commenter might "predict the future" correctly. But in most cases, the prediction will be wrong. This indemnity from Microsoft only highlights that uncertainty.

Microsoft has tried to get copyright infringement claims based on CoPilot dismissed and so far it has failed.

https://www.courtlistener.com/docket/65669506/147/doe-1-v-gi...

Arguably it is only Microsoft and OpenAI who know the full extent of copyright infringement, if any, occurring through the use of "AI". If one of these cases like Doe v Microsoft gets to discovery then maybe this knowledge will no longer be their kept secret. Perhaps the floodgates open.

Who knows. No one can predict the outcome.



sort by: page size:

> Microsoft is a big stakeholder and they might not want to get sued... Liability could explain a lot of it.

Microsoft is also responsible for Copilot (based around a “it’s unconditionally fair use if you use a mirror” legal theory) so this doesn’t track.


This may actually backfire on Butterick. I am sure Microsoft has a very competent legal department and they have given a lot of thought to the matter.

Most of these articles assume that Butterick will win, but I don’t think that’s a given. If the court rules that this is fair use, then that could set precedent for even further applications of CoPilot and remove some of the legal uncertainty around its use.


Microsoft's AI doesn't reproduce copyrighted source code. They are so confident about this that they are indemnifying customers, see: https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot...

If Microsoft truly believes they are in the right, they should publicly (and legally) state that _Microsoft_ (and not he users of copilot) takes on all responsibility for potential future copyright infringement cases against the users of copilot. If they don't do that, it's hard to take any of their claims seriously.

That hasn’t been tested in court.

But even if that were true, it’s a moot point because we are talking about the copyrighted content that the models were trained on. Hence the point the OP made that if Microsoft really wanted to reassure people then they’d promote models that were trained on Microsoft’s own code rather than handwave away these concerns with gestures of assuming theoretical liability.


At a minimum they should do that, but that's honestly not good enough. They could decide not to sue a user of copilot if it produced MS code regardless of whether copilot produces infringing code in general. An indemnification for the customer in case of third-parties suing them over code produced by copilot is honestly the only way I could take Microsoft's position seriously.

When you have millions of users, and your product has inherent danger, you can't assume the liability for all of them. (hammers and nails anyone?) The only reason that Microsoft has agreed to be liable for their users copyright stuff is because they know this case is a winner for OpenAI and that it does indeed meet merit for fair use. They wouldn't do that to 'be nice,' because not even microsoft can foot the bill for millions of users being sued. Their only alternative would be to not produce the product.

Idk if MS is on the safe here. There's a straightforward legal theory for suing, and also parties such as EFF and others with a war chest and the determination to clarify this. Does MS provide indemnification to Copilot customers/users if those are sued by others? My advice would be to stay clear of Copilot.

One of the theories circulating is that he engaged in willful copyright infringement, and may have had advance knowledge of this pending lawsuit. https://www.semafor.com/article/11/21/2023/openai-microsoft-...

> It's their responsibility to make sure they don't violate licenses in their own model.

Maybe morally that's true. But who do you think is easier to sue? Small startup using copilot or Microsoft?


I’m not a lawyer but my understanding these are torts so all you have to prove is Microsoft has liability. I think this would be easy to prove due to the way neural networks work since it’s just a way of performing a search.

Since it’s a tort I don’t think you have to prove they should have know it would return copyrighted code, the fact that it does is enough to have liability.


That line and this: > “Copilot withdraws nothing from the body of open source code available to the public,” Microsoft and GitHub claim in the filing.

Reminds me of the "Piracy is NOT theft" meme[0][1] from a few years ago.

[0]https://questioncopyright.org/cm/images/piracy-is-not-theft.... [1]https://archive.is/MbdxH


It’s up to copyright holders to enforce their copyright. They can choose at any time to start or stop or restart doing so, for any reason, as long as the term of the copyright has not expired.

I’ll say this, if someone made a prediction market, I’d put down $10 on 70% odds that Microsoft won’t try and enforce in this case. They’re trying to reach out to developers these days, and suing Fabrice over this interesting project would piss off a lot of developers. But I accept the 30% odds that they may just pursue enforcement anyway.


If Microsoft asserts and represents that their tool doesn’t generate copyright-infringing code, then surely Microsoft is the party which should be liable, rather than the poor unlucky programmer who was lied to by the billion-dollar corporation’s marketing agency?

How often did the bot make the same mistake? In how many of those cases does Microsoft now assume they have copyright and make their army of lawyers enforce it? Are you willing to go up against Microsoft in a court, rolling the dice?

What OpenAI claimes here does not seem to be true. Here is an article where ars technica tried it

https://arstechnica.com/tech-policy/2023/12/ny-times-sues-op...

And this is a screenshot of their session with copilot

https://cdn.arstechnica.net/wp-content/uploads/2023/12/Scree...


What OpenAI claimes here does not seem to be true. Here is an article where ars technica tried it

https://arstechnica.com/tech-policy/2023/12/ny-times-sues-op...

And this is a screenshot of their session with copilot

https://cdn.arstechnica.net/wp-content/uploads/2023/12/Scree...


While I appreciate the earnest defense of FOSS and the, in all fairness, totally warranted suspicion of Microsoft, given its history, I found the attitude of this article to be very sour and, actually, in bad faith. Let me address the three questions which they posed to MS:

> 1. What case law, if any, did you rely on in Microsoft & GitHub's public claim, stated by GitHub's (then) CEO, that: “(1) training ML systems on public data is fair use, (2) the output belongs to the operator, just like with a compiler”? In the interest of transparency and respect to the FOSS community, please also provide the community with your full legal analysis on why you believe that these statements are true.

I mean, I'd be floored if any corporate lawyer let anyone at [large company] answer this kind of question outside of an actual lawsuit. They are essentially asking the opposing team's lawyers to do all this work for them, for free. This is followed by an "obvious[ly]" correct (I'm being ironic) interpretation of the refusal to answer: that MS is wrong but just won't admit it. But go back and re-read the question. The question was architected to produce this impression if it wasn't answered. That's a sign of a bad faith question, rather than a question with intent to learn the answer.

> 2. If it is, as you claim, permissible to train the model (and allow users to generate code based on that model) on any code whatsoever and not be bound by any licensing terms, why did you choose to only train Copilot's model on FOSS? For example, why are your Microsoft Windows and Office codebases not in your training set?

Other commenters have discussed this one already. There is a perfectly reasonable and legitimate explanation here: The do not want to do anything that remotely risks exposing trade secrets, and that's a separate concern from potentially accidentally violating a license. Suppose the model was trained on all these public repos + MS's private repos. Someone else can come along and train their own model on the public code; now they have two code generators whose outputs can be compared to reveal secret information about MS's training set. This time, the article guesses well at the answer: MS cares more about itself than others. Sure. Why would it be expected not to?

> 3. Can you provide a list of licenses, including names of copyright holders and/or names of Git repositories, that were in the training set used for Copilot? If not, why are you withholding this information from the community?

I think this question is bad faith too. It starts by asking "can you". Then, if the answer is "no, we can't", reinterprets the answer as "no, we won't" ("withholding" is an intentional act). It is disingenuous to imply that someone who cannot do something is, therefore, intentionally refusing to do so. In the analysis of the lack of response, the article (finally) admits that it is speculating wildly, backpaddles on the implied claim that MS is refusing to provide this information, and instead takes a different approach: MS scientists can't answer because they are not good scientists. But wait, here's the kicker:

> ... so they don't actually know the answer to whose copyrights they infringed and when and how.

Busted! The authors have essentially demonstrated the question is in bad faith by suggesting that the answer to the question, "Whose data did you use?", is the same as the answer to the question, "Whose copyright did you violate?", which is a logical connection made possible only by the underlying presupposition that MS is totally incorrect in its assertion about fair use in question 1. The framing of all these questions suggests to me that the authors were already firmly convinced of their guesses as to the answers/non-answers _at the time of posing the questions_.

If they actually waited for a whole year expecting a response, that's on them. I'm with MS on the decision not to engage here, even if I share all these qualms about Copilot.


> legal speak for "any copyright infringement is your fault"

Proving intent is difficult. This basically means if you have emails in which someone describes their work as copyright laundering, Microsoft can use that to get out of indemnifying you.

next

Legal | privacy