Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

NOTE: there was a large discussion of this yesterday [1], but that was almost entirely about copyright. This submission is to a different link at Microsoft that makes it clear they are covering much more than copyright. It seemed then that it might be useful to have a separate submission to discuss the non-copyright aspects of this.

They say:

> Specifically, the Copilot Copyright Commitment will:

> • Cover third-party IP claims based on copyright, patent, trademark, trade secrets, or right of publicity, but not claims based on trademark use in trade or commerce, defamation, false light, or other causes of action that are not related to IP rights.

> • Cover the customer’s use and distribution of the output content generated by our Copilot services, but not the customer’s input data, modifications of the output content, or uses of output that the customer knows or should know will infringe the rights of others.

> • Require the customer to use the content filters and other safety systems built into the product and the customer must not attempt to generate infringing materials, including not providing input to a Copilot service that the customer does not have appropriate rights to use.

I’m somewhat at a loss to understand how they can do this. With copyright, filtering to keep the output from too closely matching too much of any particular training inputs goes a long way toward reducing the chances of copyright infringement. It might also be possible to train an AI to only include in its output things that are found in multiple training items from different sources, which would also greatly reduce the chances of emitting something that infringes. The key is that with copyright infringement it is a textual matter—does the output text too closely match too much text from some copyrighted work that the AI had accesses to when it was trained? Also they only actually have to reduce copying enough to get whatever slips through into fair use territory (in those jurisdictions where fair use is a thing).

With patents it is not a text problem. If I upload code to GitHub that implements a patented algorithm, Microsoft trains Copilot on that, and Copilot outputs code that implements that algorithm, then using that code will be patent infringement even if the output code has nothing whatsoever copied from my code. I don’t see how they will be able to filter that out or train to reduce its likelihood. And with patents there is no fair use exception.

[1] https://news.ycombinator.com/item?id=37420885



view as:

Legal | privacy