Hacker Read

thrwawy74 · 2022-10-16 19:11:04

So.. say Microsoft retrained Copilot on code only explicitly marked as open-source. As an activist or vandal you could start publishing proprietary code with fraudulent license files to pollute Copilot again.

This could be terribly fun.

reply

enragedcacti | karma 3087 | avg karma 3.75 · | 2022-10-16 17:20:21

Expanding on that, even if Microsoft sees the error of their ways and retrains copilot against permissively licensed source or with explicit opt-in, it may get trained on proprietary code the old version of copilot inserted into a permissively licensed project.

You would have to just hope that you can take down every instance of your code and keep it down, all while copilot keeps making more instances for the next version to train on and plagiarize.

reply

what-imright | karma 40 | avg karma 0.25 · | 2022-06-25 21:31:44

Arguably copilot was built using open source code against the spirit of the licenses. After all Microsoft has done to undermine open source in the past, can we allow them to have the last laugh? What if we organize a boycott?

fartsucker69 | karma 226 | avg karma 2.26 · | 2022-10-18 04:14:30

could this be solved by MS brute-force shipping all the licenses (w/ references to their original projects) of all the repos they used to train to copilot along with copilot itself?

it wouldn't cover cases where people illegally copy pasted some code into their projects with dubious / not explicit licenses, but this is the same as using any open source project in general.

reply

CapsAdmin | karma 618 | avg karma 2.35 · | 2022-10-16 17:20:57

I'm not really sure what I think about this. How responsible should Microsoft be for someone's badly licensed code on their platform? If they somehow had the ability to ban projects using stolen snippets of code, I don't think I'd dare to host my hobby projects there.

If you can't trust that the code in a project is compatible with the license of the project then the only option I see is that copilot cannot exist.

I love free software and whatnot, but I have a feeling this situation would've been quite different if copilot was made by the free software community and accidentally trained on some non free code..

reply

alpaca128 | karma 7759 | avg karma 2.86 · | 2022-06-23 08:44:10

Considering existing code already has vulnerabilities, some of which were used to train Copilot I think it's possible but not efficient in terms of success rate.

But if they continue to ignore license terms I can see someone create repos with intentionally Copilot-incompatible licenses and watermark it so they can prove the license terms were violated.

reply

VoodooJuJu | karma 2612 | avg karma 2.88 · | 2022-06-23 07:30:43

It is now proven that copilot returns code from codebases with non-permissive licenses [1].

I'm curious - what are the legal implications of this going forward? I've so many questions.

1. Will Microsoft ever face lawsuits for these license violations?

2. If so, who/how? Class-action?

3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.

4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?

[1] https://news.ycombinator.com/item?id=27710287 | Copilot regurgitating Quake code

reply

bogwog | karma 7310 | avg karma 3.86 · | 2021-07-04 00:46:19

If I were a conspiracy theorist, I'd think this was just the latest plot by Microsoft to attack open source software.

The only reason open source software is possible is because of the protections offered by copyright. Copilot basically shits all over that.

reply

concordDance | karma 4533 | avg karma 1.73 · | 2022-10-17 07:10:30

This takes place with or without copilot. The problem would be people copying code and releasing it under a different license.

Panzer04 | karma 1554 | avg karma 3.1 · | 2022-10-17 19:24:12

And if the code generated by copilot was attached to a license that you had to obey? Suddenly your propriety solution must be released as open source or rewritten, because copilot is effectively laundering open source code?

Life's a lot easier when you can just copy whoever did the hard work without crediting/paying/etc for it.

reply

speedgoose | karma 8135 | avg karma 2.13 · | 2022-07-01 03:30:39

Perhaps one should create an open-source alternative to GitHub copilot also trained on proprietary source code, such as some leaked Windows source-code, and everyone will be happy and appreciate that we can use fair-use to train such AIs.

cmrdporcupine | karma 19360 | avg karma 3.11 · | 2023-06-12 11:07:30

Copilot is a product -- at least indirectly -- of Microsoft, a company who for about a decade made very public pronouncements about how they disagreed with the GPL (or copyleft generally), found it problematic, and tried actively to discourage its use.

Today's MS isn't really the same, and they've clearly made their peace with Linux. But it still happens that the GPL is in some fundamental ways at odds with commercial exploitation of open source code. So any corporate entity is going to struggle with it because at best it requires being very careful in distribution, or trying to negotiate or cut a deal with the licensee. At worst it can lead to legal problems and IP leakage on your own product.

So, not claiming any conspiracy. Or intent to violate intentionally. But it is in the convenient interests of companies like MS/OpenAI/GitHub to treat open source work as effectively public domain rather than under copyright, and to push the limits there.

The risk to an employer is of course the accidental introduction of such copylefted material into their code-base through copilot or similar tools.

I suspect two sources of disconnect with the broader community on hackernews that doesn't seem to see the issue here:

a) Much of the folks on this forum are working in the full-stack/web space where fundamentally novel, patented, or conceptually difficult algorithms and datastructures are rare. For them Copilot is an absolute blessing in helping to reduce the tedium of boilerplate. However in the embedded systems, operating systems, compiler, game engine dev, database internals etc. world there are other aspects at work. In certain contexts, Copilot has been shown to reproduce complicated or difficult code taken from copyrighted or copylefted (or maybe even patented sources) without attribution. And apparently now with some explicit obfuscation.

To put it another way: it's unlikely that Copilot's going to violate licenses with its assistance with turning your value/model objects from one structure to another, or writing a call into a SQL ORM. But it's quite possible that if I'm writing a DB join algorithm or some complicated math in a rendering engine or a compiler optimization phase that it could "crimp notes" from a source under restrictive license... because those things are absolutely in its learning set and the LLM doesn't "know" about the licensing behind them.

b) Either misunderstanding of, or lack of knowledge of, or outright hostility to... copylefted or attribution licenses which require special handling.

reply

bugfix-66 | karma 501 | avg karma 2.15 · | 2022-11-04 18:09:48

Microsoft can argue that Copilot emits a mixture of intellectual property (a pattern from here, a pattern from there), so they don't need to give attribution.

But if we disallow training, it's unambiguous.

Either you fed the program into your training system or you didn't. The No-AI 3-Clause License forbids use in training, no question about it. If you train your model on this text, you are obviously violating the license.

Systems like Microsoft Copilot are a new threat to intellectual property. The open source licenses need to change to adapt to the threat.

Otherwise Microsoft and Amazon will pillage open source software, converting all our work into anonymous common property that they can monetize.

We're watching it happen.

reply

iostream25 | karma 106 | avg karma 1.96 · | 2022-10-20 12:45:00

My, how odd that copilot wants to "liberate" other peoples code from attribution, licenses, the project it was ripped from...

How curious that copilot doesn't ripoff Microsoft Windows or Office code...

Spare us the astroturfed ideals. You are not one of "us", Microsoft PR person.

You sound like a Russian diplomat defending the indefensible.

reply

xenomachina | karma 469 | avg karma 2.43 · | 2022-06-21 14:06:54

Open source licenses aren't a free-for-all. Many have terms like GPL's copyleft/share-alike or the attribution requirements of many other licenses. If copilot was trained on such code, then it seems that it, and/or the code it generates, violates those licenses.

dredmorbius | karma 63069 | avg karma 2.05 · | 2021-07-10 20:34:20+00:00

Keep in mind that CoPilot might itself be a well-poisoning tactic.

If use of Github presumes use of CoPilot and of comingled code under incompatible, or proprietary, licenses, such that use of such code could then create contributory infringement claims against distributors, users, or developers, there's something of a problem across the Free Software world.

Though that does seem rather a bit of a major footgun for Github / Microsoft themselves.

The Free Software movement didn't create, or even want, a world in which copyrighted software was a norm. But it adpated to the circumstance by treating copyright as a serious matter and being diligent in practices of use, appropriation, and licencing.

It's rather ironic that the source of the "Letter to Hobbyists"[1] are now advocating a devil-may-care attitude to copyright, software, and licensing in their own works and service offerings.

________________________________

Notes:

1. See: https://genius.com/Bill-gates-an-open-letter-to-hobbyists-an...

reply

an_opabinia | karma 4274 | avg karma 5.9 · | 2021-07-06 06:34:27+00:00

> open source laundering

Microsoft already launders open source code by just hiring people in China and Romania to rewrite it. Copilot is their engineering culture distilled. However most big companies do this.

reply

benatkin | karma 7495 | avg karma 1.83 · | 2023-04-21 10:31:52

That's also likely a better way to exploit open source.

Microsoft and OpenAI got a lot right but also a lot wrong, I think. This could have held Copilot back, not making proper composite projects.

reply

jameshart | karma 19910 | avg karma 4.65 · | 2022-06-21 15:40:49

I’m already terrified how many developers have been working on proprietary code bases with copilot, having an extension in their editor upload all their employer’s proprietary code to Microsoft, who then share it with OpenAI - then they’ve taken code OpenAI and Microsoft sent back to them, of unknown authorship, and added it into their code.

And now those devs are going to have to go to their boss and explain all the ways they’ve opened their company up to liability?

This should be hilarious.

reply

harkinian | karma 156 | avg karma 0.91 · | 2024-03-11 18:47:21

Seems Microsoft got away with training Copilot on a lot of GPL'd code, though.