GitHub Copilot isn't worth the risk

tick_tock_tick | karma 3482 | avg karma 2.72 · 2022-11-17 13:16:04

People are way to attached to single function examples. I'm struggling to find any example that actually rise up to the requested "originality, creativity, and fixation" for copyright to apply.

Just because something looks similar or is even identical doesn't mean copyright applies.

reply

shakna | karma 12286 | avg karma 3.3 · 2022-11-17 13:37:29

You might want to take a look at some of the pieces of code examined in Google vs Oracle before you decide small and obvious cannot bear copyright in the way that you think it does.

That horrifying back and forth showed that lawyers can consider very small and obvious fragments of code to be absolutely copyrightable. And that it went on for nearly a decade, should tell you that none of this is simple.

reply

reidjs | karma 2452 | avg karma 2.47 · 2022-11-17 14:28:23

Can you give an example? Are they trademarking `for` loops or something?

jpollock | karma 2389 | avg karma 3.63 · 2022-11-17 14:34:20

https://guides.lib.umich.edu/c.php?g=791114&p=5747565

"Google also copied the nine-line rangeCheck function in its implementing code"

Comparison between the two, discussed back in 2012:

https://news.ycombinator.com/item?id=3940683

reply

int_19h | karma 21203 | avg karma 1.69 · 2022-11-19 02:02:08

Google v. Oracle did not decide whether such code is copyrightable. The decision says that it doesn't matter because it would be fair use if it were copyrightable.

Jaygles | karma 1388 | avg karma 8.57 · 2022-11-17 16:05:12

Whether or not a court would ultimately decide an instance of CoPilot code is copyright infringement or not isn't the main issue in my opinion. The creating opportunities for other people to sue you will be much more damaging. Lawsuits that you win will also be very expensive and its not guaranteed you'll get lawyers fees paid for by the losing party.

naikrovek | karma 5388 | avg karma 1.89 · 2022-11-17 17:50:30

courts have repeatedly upheld that only humans can own copyright. nothing an AI generates is considered a "work" in copyright law, so they can't be violating works or derivative works. the onus falls on the copilot operator (one might say "the pilot") for purposes of copyright.

powera | karma 2278 | avg karma 2.51 · 2022-11-17 13:18:04

This feels like another drama in the style of SCO v. Linux. Lots of FUD, little to nothing for end-users to actually worry about.

tevon | karma 302 | avg karma 2.88 · 2022-11-17 13:21:56

Yup. There is no chance in hell of them coming after USERS of copilot.

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 13:49:01

If a company's code is audited (internally or externally), and GPL code is found, you can bet your ass the dev who committed that GPLed code will get a stern talking too, and the company will have to re-write that code.

And that's just for GPL code. Code not under an OSS license could get way worse.

reply

ianlevesque | karma 4454 | avg karma 4.38 · 2022-11-17 13:19:22

The structural completions are way more useful than the entire function completions, even in IntelliJ, where autocomplete is already extremely high quality.

The part that I find unsettling when using Copilot is the risk that credentials or secrets embedded in the code, or being edited in (.gitignore'd) config files, are being sent off to Microsoft for AI-munging and possible human review for improvements to the model.

reply

patmorgan23 | karma 1213 | avg karma 1.65 · 2022-11-17 13:42:49

You shouldn't have any credentials in your git repos anyway. GitHub will already scan your repos and alert you if it thinks there are any credentials in their.

ianlevesque | karma 4454 | avg karma 4.38 · 2022-11-17 14:11:41

You've never temporarily put a key into a file while testing? Or accidentally pasted one for a second then deleted it? Can you say the same for your entire team or company?

Since Copilot is constantly making new suggestions, a momentary entry is all it takes.

reply

PartiallyTyped | karma 2658 | avg karma 1.11 · 2022-11-17 14:13:57

Copilot doesn't retrain on data generated by you in the moment; so I don't see why this is an issue unless you push the files - with the keys - to github.

ianlevesque | karma 4454 | avg karma 4.38 · 2022-11-17 19:29:16

The model is evaluated on the server, using the content of your files.

ehutch79 | karma 2179 | avg karma 2.06 · 2022-11-17 14:29:53

credentials should never be committed. By the time you're ready to commit code, you should be reading from the environment or a config outside of the codebase, or at least .gitignore'd

Once that key is in your git history, it's in the history. You might be able to edit it, but it's going to be a nightmare to do it.

reply

ianlevesque | karma 4454 | avg karma 4.38 · 2022-11-17 19:29:53

I'm not sure why you're referring to committed. The model is being evaluated on the server, with content you haven't yet committed.

PartiallyTyped | karma 2658 | avg karma 1.11 · 2022-11-17 14:12:31

> The structural completions are way more useful than the entire function completions, even in IntelliJ, where autocomplete is already extremely high quality.

I needed to run a comparison over a window of a numpy array, and given the sheer size of my data, I needed it to be fast and efficient, which means vectorized operations with minimal python interaction. Copilot figured a solution that is orders of magnitude faster than what I could conjure up in 10 minutes, most of which I'd spent searching for similar solutions in SO.

reply

bugfix-66 | karma 501 | avg karma 2.15 · 2022-11-17 13:19:55

It's interesting to consider how you might prevent training using a license without being too restrictive.

Here is an example of a license that attempts to directly prohibit training. The problem is that you can imagine such software can't be used in any part of a system that might be used for training or inference (in the OS, for example). Somehow you need to additionally specify that the software is used directly... But how, what does that mean? This is left as an exercise for the reader and I hope someone can write something better:

  The No-AI 3-Clause License

This is the BSD 2-Clause License, unmodified except for the addition of a third clause. The intention of the third clause is to prohibit, e.g., use in the training of language models. The intention of the third clause is also to prohibit, e.g., use during language model inference. Such language models are used commercially to aggregate and interpolate intellectual property. This is performed with no acknowledgement of authorship or lineage, no attribution or citation. In effect, the intellectual property used to train such models becomes anonymous common property. The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

  License Text:

https://bugfix-66.com/7a82559a13b39c7fa404320c14f47ce0c304fa...

EMIRELADERO | karma 3064 | avg karma 4.04 · 2022-11-17 13:23:41

What about fair use? (both in the copying made for training itself and the resulting output from the service)

bugfix-66 | karma 501 | avg karma 2.15 · 2022-11-17 13:24:54

We are witnessing a monstrous perversion of "fair use" and the greatest theft of intellectual property in human history.

EMIRELADERO | karma 3064 | avg karma 4.04 · 2022-11-17 13:26:28

Do you measure IP's value using the amount of work/effort that was put into creating it, or only the end result?

Currently US copyright law only cares about the end result. Effort has no meaning or bearing in any legal analysis of copyright matters.

reply

anticensor | karma 1400 | avg karma 0.58 · 2022-11-18 08:15:02

Copyright infringement trials are tried in the infringer's jurisdiction.

echelon | karma 19387 | avg karma 2.75 · 2022-11-17 13:30:58

This is such a Luddite behavior.

How much hubris we have as a species to think that our professions will endure until the end of the stars. To think that the software we write will be eternal.

The thing that we do now is no different than spinning cotton.

I'd be shocked if the total duration of human-authored programming lasted more than a hundred years.

I'll also wager that in thirty years, "we'll" write more software in any given year than all of history up until that point.

reply

AlexandrB | karma 26643 | avg karma 5.26 · 2022-11-17 14:10:07

I'm all on board if the Microsoft's of the world are. But they choose to train their AI on OSS code and not their own codebase. So clearly they think similarly to the parent, they just want you to forget about that part when it suits them.

echelon | karma 19387 | avg karma 2.75 · 2022-11-17 15:01:36

If we pass laws restricting the training on copyrighted information, the only organizations that will be able to train will be institutional.

Microsoft would benefit from restriction. Not us.

reply

blibble | karma 9766 | avg karma 3.04 · 2022-11-17 15:59:02

would you pay for a product trained on say, the MS Teams, Sharepoint or Skype codebases?

no, and no-one else would either

reply

lloeki | karma 11319 | avg karma 2.78 · 2022-11-17 14:17:39

This is the BSD 2-Clause License:

    1. Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in
       the documentation and/or other materials provided with the
       distribution.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Presumably, as long as GitHub Copilot:

a) fails to respect these itself, or

b) present the user that is going to use its output verbatim or produce derivative code from it so that the user can respect these

Then GitHub Copilot is either in violation of the license or a tool assisting in such a violation by stripping the license away†.

From TFA:

> David Heinemeier Hansson, creator of Ruby on Rails, argues that the backlash against Copilot runs contrary to the whole spirit of open source. Copilot is “exactly the kind of collaborative, innovative breakthrough that I’m thrilled to see any open source code that I put into the world used to enable,” he writes. “Isn’t this partly why we share our code to begin with? To enable others to remix, reuse, and regenerate with?”

I don't mean to disrespect DHH, but the "spirit of open source" isn't to wildly share code around as if it were public domain, because it is not, an author gets to choose within which framework their code gets to be used and modified††, otherwise one would have used public domain as a non-license + WTFPL for those jurisdictions where one can't relinquish their own creation into public domain.

† depending on whether the "IA"/Microsoft can be held liable of the automated derivative, or if the end user is.

†† cue GPL vs MIT/BSD

reply

voz_ | karma 2520 | avg karma 4.52 · 2022-11-17 15:50:33

The spirit of this is good, but the implementation is garbage - you need a lawyer or team of lawyers to do this right. You grandstand and soapbox in this weakly written paragraph, and it hurts the whole thing. You discuss social rewards, intentions, etc. This just reads like a stallman-esque tirade

bugfix-66 | karma 501 | avg karma 2.15 · 2022-11-17 16:09:54

EMIRELADERO | karma 3064 | avg karma 4.04 · 2022-11-17 17:52:20

Wow. That's aggressive.

You previously said:

"I work at the most important company in the "AI" industry, a company you hear about every day.

I write GPU kernels for transformers and convolutions. You probably use my BLAS kernels in your networks."

It's pretty easy for people to figure out it's NVIDIA. You probably work on the cuBLAS library.

All the personal attacks I've seen come from you, with your snarky comments.

All the people that responded to you did so in good faith, trying to engage in honest conversation.

reply

voz_ | karma 2520 | avg karma 4.52 · 2022-11-17 18:36:12

Hmm, I am going to overlook the threats and strange response - I think what you have here, in this license you are trying to push, is a good thing. I was giving it feedback. Hire a lawyer, strip away the opinions, and you're cooking. I wish you luck with it.

tevon | karma 302 | avg karma 2.88 · 2022-11-17 13:20:31

Seems this article completely misses the benefits of copilot. Its a massive step forward in productivity. For me, its about suggesting proper syntax across the various libraries we use. It really does cut time by 10s of percent.

I don't buy the argument that the risk of a yet-to-be-litigated case against a different company, who will certainly fight this hard; is greater than the productivity gain of using copilot.

Additionally, the security argument feels ridiculous to me. We lift code examples from gists and stackoverflow ALL THE TIME! But any good dev doesn't just paste it in and go, instead we review the code snippet to ensure its secure. Same thing with copilot, of course its going to write buggy/insecure code, but instead of going to stackoverflow for a snippet its suggested in my IDE and with my current context.

reply

lelandfe | karma 6846 | avg karma 4.15 · 2022-11-17 13:23:51

> I don't buy the argument that the risk... is greater than the productivity gain of using copilot.

How does your company's general counsel feel?

This article is written at CTOs, not engineers.

reply

jzelinskie | karma 2564 | avg karma 3.97 · 2022-11-17 13:27:39

I suspect prohibiting Copilot will just become another checkbox on compliance security questionnaires. The fact that Kolide can detect it and that Kolide can feed compliance suites like Vanta or SecureFrame means the infrastructure is already there. It's not only your lawyers that want these guarantees, it's often your customers.

tevon | karma 302 | avg karma 2.88 · 2022-11-17 13:29:54

We don't have GC (too small), so caveat my take with the fact that I'm writing from a smaller companies perspective.

May be different for a larger, value-preserving company who would face more scrutiny.

That being said, I still find it extremely unlikely that there would be legal ramifications from using a product being pushed by one of the largest software companies in the world. Why go for a user and not Microsoft themselves?

reply

fuckstick | karma 425 | avg karma 3.32 · 2022-11-17 13:32:12

jen20 | karma 5666 | avg karma 2.04 · 2022-11-17 13:32:41

> Why go for a user and not Microsoft themselves?

1) the user likely doesn’t have the legal resources of Microsoft.

2) the user is the one committing the infringement.

If Microsoft stood behind this they could offer to indemnify users against lawsuits relating to CoPilot usage, but they don’t.

reply

tsimionescu | karma 17553 | avg karma 2.17 · 2022-11-17 14:17:31

> That being said, I still find it extremely unlikely that there would be legal ramifications from using a product being pushed by one of the largest software companies in the world.

Microsoft is explicitly saying it's your responsibility to check if the Copilot's output that you add to your codebase is not infringing on anyone's license.

Also, it's actually a complex legal question if Copilot itself is infringing anyone's copyright. But, there is no doubt whatsoever that you don't have the right to distribute someone else's copyrighted code (without a license) just because it was produced by Copilot and not manually copied by you. And it is also very clear that Copilot can occasionally generate larger pieces of someone else's code.

Edit: fixed typos

reply

ninkendo | karma 6583 | avg karma 4.03 · 2022-11-17 14:28:57

> Microsoft is explicitly saying it's your responsibility to check if the Copilot's output that you ads to your codebase is infringing on anyone's license.

(Never used copilot)

Wow, this is kinda shocking IMO. It kind of negates the entire value proposition of the tool.

How am I supposed to find out whether a snippet is infringing? Should I paste it into google or something? Shouldn’t Copilot be the one to tell me if a snippet too-closely matches some existing code it learned from?

If MS is indeed saying this, I feel like it’s something they put in the agreement to cover their own asses. There’s no way they’d really expect everyone to do this sort of thing. Moreover I don’t feel that’s a very strong defense MS could use in court if somebody decides to go after MS for making the tool that makes infringement so easy. It sounds like one of those “wink wink” types of clauses that they know full well nobody will follow.

reply

tsimionescu | karma 17553 | avg karma 2.17 · 2022-11-17 14:47:01

From the official FAQ [0]:

> Other than the filter, what other measures can I take to assess code suggested by GitHub Copilot?

> You should take the same precautions as you would with any code you write that uses material you did not independently originate. These include rigorous testing, IP scanning [emphasis mine], and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it.

I think lots of companies do run tools such as BlackDuck and others to scan their entire code base and ensure (or at least have some ass-covering) that there is no accidental copyright infringement.

[0] https://github.com/features/copilot#other-than-the-filter-wh...

reply

coredog64 | karma 3819 | avg karma 2.1 · 2022-11-17 15:21:00

How much of what you save by using Copilot will then be spent on BlackDuck licenses?

warkdarrior | karma 1451 | avg karma 1.46 · 2022-11-17 15:54:11

Capex vs opex, huge difference

tsimionescu | karma 17553 | avg karma 2.17 · 2022-11-18 11:49:17

While the cost to programmers' sanity of running things like BD is immeasurable in my estimation, if you are already doing it, doing it for Copilot code shouldn't add any extra cost, unless Copilot is actually constantly spewing copyrighted code.

int_19h | karma 21203 | avg karma 1.69 · 2022-11-19 01:58:25

> While the cost to programmers' sanity of running things like BD is immeasurable in my estimation

Can you clarify? In my experience, source scan is just another job in one's build pipeline. And I've only seen it fail when it does, in fact, detect a new component (or a license change in the existing component) - because at that point you have to do the legal dance for third-party notices etc. But the latter part something you have to do either way, tools or no tools.

reply

tsimionescu | karma 17553 | avg karma 2.17 · 2022-11-19 03:06:47

Source scan is indeed not a problem. Scanning all the binary blobs is where things go wrong, on two aspects.

For 1, there are quite a few false positives, especially if you use commercial 3rd parties as well. For example, I had a UI component recognized as some obscure academic micro kernel!? Investigating, we found that happened because that micro kernel project was using the same commercial UI component somewhere (probably under some academic license), and there repo was just where BD had seen this JS code before.

For a second, and much more common and annoying one, at least in BD in my company, you have to add explanations to each individual identified 3rd party package that uses something like GPL to affirm that it is being used in a way that complies with a license. If you're doing something like distributing a Linux VM, that means hundreds of packages that are part of the distribution. This work has to be done manually, which means entering the same copy/paste text in hundreds of places in the atrociosly slow BD UI.

reply

yamtaddle | karma 9532 | avg karma 3.42 · 2022-11-17 13:25:45

Does Microsoft let their developers use it? Say, when working on Windows? If not, I'd say the very vendor of the software considers it radioactive, so I'll keep treating it that way, too.

aliqot | karma 4118 | avg karma 2.84 · 2022-11-17 13:34:04

Is 'what would a microsft dev do' really the bar we want to live by, though?

patmorgan23 | karma 1213 | avg karma 1.65 · 2022-11-17 13:38:29

No but it's not a bad litmus test in this situation.

yamtaddle | karma 9532 | avg karma 3.42 · 2022-11-17 13:38:35

In this case, yes, of course—I don't really get your objection. If their own legal counsel is advising them not to let their developers use their own product over legal concerns (and what else could be the reason?) that would be a pretty good argument against anyone else using it.

Nb. I don't know whether they do or do not, in fact, let their developers use it.

reply

patmcc | karma 4216 | avg karma 5.16 · 2022-11-17 15:23:38

Say what you want about microsoft - they've got some of the best lawyers in the world on this kind of stuff. If they're not doing it they either don't trust the tech or don't trust the law.

ed_balls | karma 829 | avg karma 1.34 · 2022-11-17 13:30:17

> It really does cut time by 10s of percent.

I used it for about a month. It gave me a few false positive that really burned me - it's not worth the risk. Maybe future versions would be better.

reply

tevon | karma 302 | avg karma 2.88 · 2022-11-17 13:34:17

What're the examples of false positives?

Agreed it gets things wrong very frequently. But I've found it much easier to use its suggestion as another "input" to writing code.

reply

lolinder | karma 32428 | avg karma 5.57 · 2022-11-17 13:38:32

I've gotten plenty of false positives, but the mistakes turn up in testing and are pretty easy to spot when reviewing the code. Anything more subtle is likely to have been missed when written by hand anyway.

What happened to burn you so badly?

reply

khalilravanna | karma 954 | avg karma 3.37 · 2022-11-17 13:34:28

This. If copilot suggests anything more than basic syntax or boilerplate I don’t use it. If it writes code I don’t understand or wouldn’t be able to write myself I won’t use it. Why? Because at the end of the day it’s my code. In what world is a good engineer submitting a PR for coworkers to look over that isn’t their code?

If this is a real issue the solution is not banning yet another tool. It’s education. Teaching engineers how to properly understand code attribution and licenses.

reply

echelon | karma 19387 | avg karma 2.75 · 2022-11-17 13:36:03

Do you think we'll be writing software 200 years from now?

50? 25?

I'll bet the people spinning cotton thought that would endure forever.

(Sorry if my tone comes across as fervent. I'm excited to be displaced by this, because what follows is the stuff of dreams.)

reply

Waterluvian | karma 45068 | avg karma 5.19 · 2022-11-17 13:48:55

Whenever I watch Geordi and Data doing something in engineering, they’re often talking to the computer about constructing models and sims and such.

To me this is the most ultimate form of declarative programming. Not that we will all be talking it out, but that we will explain in natural language what we’re after.

It maximizes how much time we spend in the “problem understanding/solving” phase and minimizes the tedium of actually setting up the apparatus.

reply

yonaguska | karma 1382 | avg karma 2.73 · 2022-11-17 13:49:30

The invention of the cotton gin simply moved people from spinning cotton to picking cotton. And increased demand for slaves.

I'm not excited to be displaced personally, but I'm also not really worried about being displaced. If displacement is inevitable, I don't see how the average programmer is going to leverage this for the "stuff of dreams". Usually, tech advancements result in a greater consolidation of wealth into the hands of those that already own capital. Recent tech is no exception. Yes, there has been a lot of wealth created for regular people, but we're still working 40+ hour weeks, and earnings have not matched the increase in productivity.

What I am concerned about is that our field is becoming increasingly arcane magic for the younger generations, especially the masses that are being completely and utterly failed by the education system.

reply

bravetraveler | karma None | avg karma None · 2022-11-17 16:48:55

I apologize ahead of time for rambling, but I'm with you on this!

In my coworkers and many of the applicants we see, there's a trend of over optimization. The common meme is the 'leet code' interview process.

I suppose the best way I can convey this is... I think there's hyper focus on the mechanics of doing things. Making people not afraid of the code, unaware of the world around it

Abandoning a lot of thought for process. Or even the physical systems it runs on. I recently learned about the term 'mechanical sympathy'

Sometimes it's important to ask if you need the code or system at all!

I know it's not fair to people but I groan any time I see a CS degree

reply

tines | karma 2003 | avg karma 3.53 · 2022-11-17 13:52:02

I mean, yes? People will be doing math as long as there are people around to do it. It'll look different, sure. But there will always be problems, and math/programming is problem solving par excellence.

ben_w | karma 20467 | avg karma 1.69 · 2022-11-17 14:04:37

Between 2016 and 2021, I've been of the opinion that I cannot make any reasonable forecast of even vague large-scale social/technological/economic development past 2030, because the trends in technology go all funky around then.

Thanks to recent developments in AI (textual and visual), I no longer feel confident predicting any of those things past about the beginning of 2028.

It's not a singularity, it's an event horizon: https://kitsunesoftware.wordpress.com/2022/09/20/not-a-singu...

reply

ThrowawayR2 | karma 6952 | avg karma 2.38 · 2022-11-17 14:40:11

These assisted coding systems are tremendously exciting but they are only the analogue of moving from a shovel to a powered excavator; it still needs a trained individual who knows what the final result needs to look like to a fairly high technical level to be effective. So, yes, 25-50 years from now humans will still be be the principal element in writing software.

tick_tock_tick | karma 3482 | avg karma 2.72 · 2022-11-17 14:53:01

I don't see a world where programming isn't the last thing to go. We pretty much have a general intelligence when a "programmer" is no longer needed. That doesn't mean programming will look anything like it does today in 200 years but will the profession, doing kinda the sameish thing, still exist? Absolutely!

bcrosby95 | karma 9525 | avg karma 3.64 · 2022-11-17 15:01:03

It's interesting to think about. If programming can be automated away, then you can use that automation to automate away any job in the world that can be automated.

rmbyrro | karma 3453 | avg karma 1.97 · 2022-11-17 14:57:46

Yeah, in the future there will be only AIs developing apps and AIs using apps.

There won't be apps, actually, they'll do everything programmatically.

And all humans would have been killed by then in an AI doom.

reply

tevon | karma 302 | avg karma 2.88 · 2022-11-17 13:36:50

Yes! Exactly.

The article suggests that he wants to know "who wrote the code" if a senior dev he trusts submits a PR. He doesn't want to be surprised that "the AI" wrote some of this code.

But its ALL written by the senior dev. If he trusts that dev, that means that dev has thoroughly read and tested his code! That's the important bit. Remembering proper syntax/imports/nesting levels is the tiniest piece of writing good code. And copilot can take that off our hands.

reply

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 13:40:42

That's like saying that code copy/pasted from OSS projects on github was "written by the developer". Which is not true.

The speed of your developer and the correctness and test coverage of your code doesn't matter when it comes to license compliance.

And license compliance could cost your company 100x (if not more) the value of your best software developer - especially for the non-OSS licenses.

reply

iceburgcrm | karma 50 | avg karma 0.72 · 2022-11-17 13:45:13

It was written by the developer. If I write down lyrics I remember I still wrote it. Whether I have the copyright to make money off of it or whether it is trademarked are different things.

You could state they are not the first to write this which would be more correct.

reply

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 13:51:30

GitHub Copilot has been concretely demonstrated to emit significant chunks of OSS licensed code.

Significant enough that if the license is GPL (which some has been) it will "taint" the entire codebase and license it under GPL. Significant enough to be found by automated OSS audit tools, which would trigger a re-write and education for the developer who committed it.

EDIT:

> If I write down lyrics I remember I still wrote it.

Not from a copyright point of view. The rights to those lyrics belong to the songwriter. It's kinda like photographs. You don't automatically have the right to distribute a photograph of yourself that was taken by someone else.

reply

warkdarrior | karma 1451 | avg karma 1.46 · 2022-11-17 15:52:09

> Significant enough that if the license is GPL (which some has been) it will "taint" the entire codebase and license it under GPL. Significant enough to be found by automated OSS audit tools, which would trigger a re-write and education for the developer who committed it.

That "significant enough [...] to taint the entire codebase" remains to be decided in court.

reply

ekidd | karma 9965 | avg karma 10.34 · 2022-11-17 16:14:53

Several of the byte-for-byte copies pointed out by open source authors were longer than 20 lines, and contained verbatim comments.

I am not a lawyer, but that's been enough to get people in legal trouble in the US.

reply

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 19:23:44

> That "significant enough [...] to taint the entire codebase" remains to be decided in court.

I doubt any employer would appreciate being this particular guinea pig because one of their employees wanted to avoid writing some boilerplate.

reply

khalilravanna | karma 954 | avg karma 3.37 · 2022-11-17 13:59:41

> That's like saying that code copy/pasted from OSS projects on github was "written by the developer".

I don't think that's what OP is saying. What I think OP is saying (and I agree) is that submitted code is trusted if you trust the source. If you take the person putting code in front of you and ask "Would this person copy someone else's code and submit it as their own" and the answer is "No they would not copy code" then every step that trusted-person took to get to that code is immaterial. Whether they used StackOverflow or Copilot or whatever AI assisted code generating tools do or don't get developed in the future. At the end of the day a good, trustworthy engineer isn't going to use licensed software by "accident"[1].

1. I put "accident" in quotes because it seems so crazy to me that someone would start writing a method "doThing" and then CoPilot spits out a licensed implementation of "doThing" and the engineer would look at it and go "This seems fine."

reply

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 14:12:10

> every step that trusted-person took to get to that code is immaterial.

Which is, unfortunately, completely useless when it comes to copyright infringement. Trust in the individual will not change the output of an audit for copyrighted code, or the results from said audit.

The only thing that a "trusted" individual can contribute in a copyright infringement investigation is attesting that they did not know that the code they put in the codebase was copyrighted. And all that does is save the company from getting the higher "willful infringement" fines, if it should get that far.

Wilful Infringement Damages: https://www.ce9.uscourts.gov/jury-instructions/node/708

reply

sieabahlpark | karma 4 | avg karma 0.0 · 2022-11-17 13:51:45

viraptor | karma 41139 | avg karma 2.79 · 2022-11-17 17:51:05

> If it writes code I don’t understand or wouldn’t be able to write myself

For me the bar is higher - it's not that I wouldn't understand it, it's that its easier to miss mistakes when reviewing than when writing from scratch. In the same way you may have ignored the typo in this comment and understood what I meant regardless of the mistake. But that doesn't work for programming - a mistake is a mistake and likely matters in edge cases even if it's not immediately obvious.

reply

LeifCarrotson | karma 16599 | avg karma 4.61 · 2022-11-17 13:34:57

In Intellij or Visual Studio, syntax suggestion/tab completion are already great. Those technologies - which involve none of the legal risks of Copilot- are a massive step forward in productivity. Copilot does help extend these benefits to other languages that I occasionally dabble in, like Lua and embedded C, though it's clearly better in languages which are better represented in its dataset.

I don't find the natural language comment to buggy algorithm part of Copilot to be particularly useful. I know some people asked to be able to write a "DoWhatIMean(), method, but programmers really only wanted that to auto-expand to "protected virtual void DoWhatIMean() {}" without having to wait 30 seconds to check for a compile error and see if it was protected void virtual or protected virtual void...

reply

lolinder | karma 32428 | avg karma 5.57 · 2022-11-17 13:47:08

> In Intellij or Visual Studio, syntax suggestion/tab completion are already great. Those technologies - which involve none of the legal risks of Copilot- are a massive step forward in productivity. Copilot does help extend these benefits to other languages that I occasionally dabble in, like Lua and embedded C, though it's clearly better in languages which are better represented in its dataset.

Copilot is so much beyond regular autocomplete that it's playing a completely different game.

I've been using it today while writing a recursive descent parser for a new toy language. I built out the AST in a separate module, and implemented a few productions and tests.

For all subsequent tests, I'm able to name the test and ask Copilot to write it. It will write out a snippet in my custom language, the code to parse that snippet, and construct the AST that my parser should be producing, then assert that my output actually does match. It does this with about 80% accuracy. The result is that writing the tests to verify my parser takes easily 25% of the time that it has when I've done this by hand.

In general, this is where I have found Copilot really shines: tests are important but boring and repetitive and so often don't get written. Copilot has a good enough understanding of your code to accurately produce tests based on the method name. So rather than slogging through copy paste for all the edge cases, you can just give it one example and let it extrapolate from there.

It can even fill in gaps in your test coverage: give it a @Test/#[test] as input and it will frequently invent a test case that covers something that no test above it does.

reply

lambdadmitry | karma 580 | avg karma 1.34 · 2022-11-17 16:47:25

Thing is, for something like an AST parser you want a property test, not a bunch of autogenerated boilerplate.

Generally, if something is boring and repetitive it's probably shouldn't be written, better code generation is rarely a good answer.

reply

lolinder | karma 32428 | avg karma 5.57 · 2022-11-17 18:19:00

Property tests are nice for lots of things, but for an AST parser? You'd basically have to re-implement the parser in order to test the parser, wouldn't you?

I suppose you could test "if I convert the AST back to a string do I get the same result", but that's not actually your goal with an Abstract Syntax Tree. If nothing else the white space should be allowed to change.

What sort of property tests did you have in mind?

reply

lambdadmitry | karma 580 | avg karma 1.34 · 2022-11-17 19:10:37

You are right that simplified reimplementations make good property tests, but in this case I'd go the other way around: generate an AST, render it a in test case-dependent way (adding whitespace as you said, but also parens etc), inject known faults for a fraction of test cases, and check that the parsed AST is equivalent to the original one or errored out if a fault was injected. Rendering a given AST is usually simpler than parsing it.

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 13:43:48

> instead we review the code snippet to ensure its secure

Doesn't matter. A developer's speed and test completeness and code quality matter not one whit when it comes to licensing. That 10x developer could mire the company in fines and code re-writes if they include copyrighted code, especially if it's not OSS.

reply

patrickthebold | karma 1103 | avg karma 3.31 · 2022-11-17 13:49:51

I don't understand how it improves productivity _that_ much. Most of my time isn't actually spent on syntax but rather reading Hacker news and making irrelevant comments.

terracatta | karma 1201 | avg karma 13.97 · 2022-11-17 13:52:18

Using it in practice, the sheer quantity of suggestions (often one for every line) is fatiguing especially when 99% of the time they seem fine.

I posit it becomes increasingly likely over large periods of time over many engineers that severe bug or security issue will be introduced via an AI provided suggestion.

This risk to me is inherently different than the risk accepted that engineers will use bad code from Stack Overflow. Even Stack Overflow has social signals (upvotes, comments) that allow even an inexperienced engineer to quickly estimate quality. The amount of code used by engineers from Stack Overflow or blogs etc, is much smaller.

Github Copilot is constantly recommending things and does not gives you any social signals lower experienced engineers can use to discern quality or correctness. Even worse, these are suggestions that are written by an AI that does not have any self-preserving motivations.

reply

lolinder | karma 32428 | avg karma 5.57 · 2022-11-17 13:56:45

Copilot's default behavior is stupid. You can turn off auto-suggest so that it only recommends something when you prompt it to, and that should really be the default behavior. This would encourage more thoughtful use of the tool, and solve the fatigue problem completely.

In IntelliJ, disabling auto complete just requires clicking on the Copilot icon in the bottom and disabling it. Alt+\ will then trigger a prompt. I know there's a way to do this in VSCode as well, but I don't know how.

reply

joenot443 | karma 2844 | avg karma 3.0 · 2022-11-17 15:40:23

> I know there's a way to do this in VSCode as well, but I don't know how.

I dug into this a bit, since I want the same functionality, I found I needed an extension called settings-cycler (https://marketplace.visualstudio.com/items?itemName=hoovercj...) which lets one flip the 'github.copilot.inlineSuggest.enable' setting on and off with a keybind.

Not sure who's in charge of the Copilot extension for VS Code, but if you're out there reading this, the people definitely want this :) Otherwise of course, your tool rocks!

reply

nprateem | karma 2352 | avg karma 1.59 · 2022-11-17 16:03:03

I switched it off and never remember to bother using it. It's obvious why it's enabled by default.

tevon | karma 302 | avg karma 2.88 · 2022-11-17 14:22:06

This is a very solid argument. How do we fix that?

THIS is the article I want to read!

reply

throwaway675309 | karma 1570 | avg karma 2.06 · 2022-11-17 15:34:58

I would argue that this kind of problem is going to become less of an issue overtime, since they're going to have to also solve the issue of suggesting code samples from deprecated API versions - it's likely that eventually they'll figure out a similar way to promote more secure types of code in the suggestions based on Stack overflow or other types of ranking systems.

visarga | karma 12425 | avg karma 1.65 · 2022-11-17 16:57:44

Yes, the will surely improve a lot and also train users to write better prompts and comments. With millions of users accepting suggestions, then fixing them, they get tons of free labeling. If they monitor the execution errors they got another useful signal. If they use an execution environment they could use reinforcement learning, like AlphaGo, to generate more training data.

redleggedfrog | karma 1990 | avg karma 4.24 · 2022-11-17 15:54:44

"I posit it becomes increasingly likely over large periods of time over many engineers that severe bug or security issue will be introduced via an AI provided suggestion."

I'll go one further with the "Co-pilot is stupid."

It's supposed to be artificial intelligence. Why in the eff is it suggesting code with a bug or security issue? Isn't the whole point that it can use that fancy AI to analyze the code and check for those kind of things on top of suggesting code?

Half-baked.

reply

visarga | karma 12425 | avg karma 1.65 · 2022-11-17 17:11:37

Ah yes, humans are perfect, never make any mistakes. That's why only AI write bugs.

visarga | karma 12425 | avg karma 1.65 · 2022-11-17 16:54:24

> I posit it becomes increasingly likely over large periods of time over many engineers that severe bug or security issue will be introduced via an AI provided suggestion.

AI can also do code review and documentation helping us reduce the number of bugs. Overall it might actually help.

reply

bGl2YW5j | karma 73 | avg karma 2.52 · 2022-11-17 17:07:50

"...does not gives you any social signals lower experienced engineers can use to discern quality or correctness" is very astute.

I experienced this in practice. I was pairing with an inexperienced engineer who was using Copilot. He was blindly accepting every Copilot suggestion that came up.

When I expressed doubt in the generated code (incorrect logic + unnecessarily complex syntax), he didn't believe me and instead trusted that the AI was right.

reply

visarga | karma 12425 | avg karma 1.65 · 2022-11-17 17:19:34

As programmers we take pride in being DRY. Copilot is helping us not reinvent the same concept 1000 times. It also makes developers happier, reduces the need to context switch, increases speed and reduces frustration.

> Github Copilot is constantly recommending things

It's only a momentary problem, will be fixed or worked around. And is this a bad thing to get as many suggestions as you could? I think it's ok as long as you can control its verbiage.

> does not gives you any social signals

I don't see any reason it could not report on the number of stars and votes the code has received. It's a problem of similarity search between the generated code and the training set, thus finding attribution and having the ability to check votes and even the license. All doable.

> an AI that does not have any self-preserving motivations

Why touch on that, people have bodies and AIs like Copilot have only training sets. We can explore and do new things, AIs have to watch and learn but never make a move of their own.

reply

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-17 17:33:50

>> As programmers we take pride in being DRY. Copilot is helping us not reinvent the same concept 1000 times.

That's what libraries are for.

Copilot is just copy / paste of the code it was trained on.

When the code it was trained on is later discovered to have CVEs, will it automatically patch the pasted code?

With a library, you can update to the patched version. Copilot has no such feature.

reply

lolinder | karma 32428 | avg karma 5.57 · 2022-11-17 17:52:17

> Copilot is just copy / paste of the code it was trained on.

Every time I hear someone say this, I hear "I've never really tried Copilot, but I have an opinion because I saw something on Twitter."

Given the function name for a test and 1-2 examples of tests you've written, Copilot will write the complete test for you, including building complex data structures for the expected value. It correctly uses complex internal APIs that aren't even hosted on GitHub, much less publicly.

Given nothing but an `@Test` annotation, it will actually generate complete tests that cover cases you haven't yet covered.

There are all kinds of possible attacks on Copilot. If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

EDIT: There's also this fun Copilot use I stumbled across, which I dare you to find in the training data:

    /**
    Given this text:
 
    Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

    Fill in a JSON structure with my name, how much money I had, and where I'm going:
    */

    {
        "name": "Ishmael",
        "money": 0,
        "destination": "the watery part of the world"
    }

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-17 17:57:44

>> If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

So if "it could commit copyright infringement, but does not always do so" is good enough for your company's legal review team, then go for it.

reply

visarga | karma 12425 | avg karma 1.65 · 2022-11-18 00:25:06

Has anyone tried to see how similar is their manually written code to other codes out there? I bet small snippets 1-2 lines long are easy to find. It would be funny to realise that we're more "regurgitative" than Copilot by mere happenstance.

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-18 21:54:37

Will the court believe that Copilot created an exact copy of Tim Davis's code "by mere happenstance"?

https://twitter.com/DocSparse/status/1581461734665367554

reply

visarga | karma 12425 | avg karma 1.65 · 2022-11-17 17:58:04

It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-17 18:18:43

>> It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

That's cool.

But emitting copyrighted code without attribution and in violation of the code's license is still copyright infringement.

If I created a robot assistant that cleans your house, does the shopping, and occasionally stole things from the store, it would still be breaking the law.

reply

throwaway675309 | karma 1570 | avg karma 2.06 · 2022-11-17 22:06:40

While I do enjoy everybody acting as armchair lawyers.... until we get an actual legal ruling, the general consensus seems to be that it is sufficiently transformative as to be considered fair use.

visarga | karma 12425 | avg karma 1.65 · 2022-11-18 00:20:37

> occasionally stole things from the store

It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft - copying open online content and sharing? theft, learning from data and generating - also theft. Stealing from a physical store - you guessed it.

reply

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-18 21:49:16

>> It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft

Theft has a definite legal meaning. So does copyright infringement.

The court can decide if it is copyright infringement or fair use:

https://githubcopilotlitigation.com/pdf/1-0-github_complaint...

reply

woah | karma 10252 | avg karma 3.02 · 2022-11-17 13:56:19

I always get the impression that CoPilot critics have never actually used it to get any work done and are basing their criticism solely on a tweet they saw about the Quake square root copy pasta function

lelandfe | karma 6846 | avg karma 4.15 · 2022-11-17 14:03:08

The article itself lists three other recent examples, two of which are clearly copyright infringement https://twitter.com/DocSparse/status/1581461734665367554

It is not a theoretical concern

reply

falcolas | karma 33589 | avg karma 3.15 · 2022-11-17 14:23:04

Oof. LGPL. That "time saver" will infect your entire codebase and open your company to sizable liability.

Even if they're never sued, companies will do internal OSS scans to limit their risks which would catch this. The result would be (at minimum) a talking to for the dev who committed it, and developer time spent doing a clean room re-write.

reply

charcircuit | karma 2002 | avg karma 0.38 · 2022-11-17 15:14:25

>will infect your entire codebase

No, it won't. It will only infect the resulting binary.

reply

ColonelPhantom | karma 559 | avg karma 1.81 · 2022-11-18 00:29:16

And then I will download a trial of the resulting binary, and send a GPL compliance letter to the company. Unless they took care to use dynamic linking in the LGPL case, they are legally obliged to send me the source code under the license, so I can release it all as FOSS under that license.

charcircuit | karma 2002 | avg karma 0.38 · 2022-11-18 02:46:46

Yes, they are legally obliged, but they can just tell you to pound sand and you can't do anything because you aren't the copyright holder.

ColonelPhantom | karma 559 | avg karma 1.81 · 2022-11-18 12:03:54

Then I contact the copyright holder (or one of the many, in case of e.g. the Linux kernel). They probably care, or else they would not use the GPL. I also believe there are organizations that can help such as the Software Freedom Conservancy.

woah | karma 10252 | avg karma 3.02 · 2022-11-23 11:34:24

Ok, 3 tweets where someone has coaxed copilot into reproducing some copypasta for clout.

That's just not likely in any real use of copilot since it is typically completing single lines, using the variables and patterns which occur in the file that is being edited.

Anybody who had actually used it for work would know that these contrived examples are irrelevant.

reply

whateveracct | karma 5466 | avg karma 1.98 · 2022-11-17 14:01:19

Physically coding is not at all where I spend the majority of my time at work or on personal projects. I exclusively use Haskell though, so maybe that has more to do with it.

But why optimize a non-critical path?

reply

insanitybit | karma 4243 | avg karma 2.69 · 2022-11-17 14:10:11

Indeed. I had to write a graph traversal iterator in Rust and Copilot wrote the entire thing for me. I could have written it myself, it would have looked similar, but it just... did it. It was trivially to test and verify correctness.

That's minutes of work, maybe even 10 minutes, turned into seconds. That is huge.

The risk here is extremely low. Who is going to sue consumers of Copilot? It makes no sense. They'll sue Microsoft and, in a decade, we'll see if the win or lose (IMO Microsoft will win, but it's not important).

reply

boxed | karma 2915 | avg karma 1.94 · 2022-11-17 14:25:18

Did it "write it for you"? Or did it "illegally copy it for you"? That's a very big difference.

I'm not claiming that you can't get big productivity boosts by ripping off code like a crazy person. I bet you can! But should you?

reply

cauefcr | karma 115 | avg karma 1.29 · 2022-11-17 15:04:53

Yes, software copyright and patents are a mistake.

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-17 15:22:56

>> Yes, software copyright and patents are a mistake.

Richard Stallman would agree, but there are many of us who make a living writing software.

Is software valuable enough that people will pay money for it?

If you write original software that solves a problem, shouldn't you be able to license it how you want and profit from it?

You are welcome to license the software you create how you want. Let me license the software I create how I want.

If I dual license my software as GPL and commercial and GitHub Copilot reproduces my GPLed code without attribution and without the license, how it that not copyright violation?

reply

rattlesnakedave | karma 733 | avg karma 2.78 · 2022-11-17 15:35:48

Do you find meaningful distinction between an individual reading your code and copying patterns vs an AI model doing the same?

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-17 16:23:20

No, provided that both give proper attribution and follow the license the code is released under.

rattlesnakedave | karma 733 | avg karma 2.78 · 2022-11-17 16:33:22

That's a hilarious expectation. How often do you give attribution to inventors of patterns you use in your software?

thesuperbigfrog | karma 7672 | avg karma 3.88 · 2022-11-17 17:23:48

>> How often do you give attribution to inventors of patterns you use in your software?

If GitHub Copilot was only "copying patterns" then it would be a lot harder to call it copyright violation and misappropriation of existing code.

And yet that is exactly what GitHub Copilot has been accused of doing: recreating copyrighted works without attribution and in violation of the licenses that the code was released under:

https://githubcopilotlitigation.com/pdf/1-0-github_complaint...

https://twitter.com/DocSparse/status/1581461734665367554

>> That's a hilarious expectation.

Only if you think lawsuits are funny. Cease and desist orders and damages show that they are no laughing matter.

reply

boxed | karma 2915 | avg karma 1.94 · 2022-11-20 13:26:48

Stallman is pretty hard FOR copyright. The strong guarantees of "free software" is 100% based on strong copyright law.

cauefcr | karma 115 | avg karma 1.29 · 2022-11-30 20:51:49

That's not what i get from reading him https://www.globalnerdy.com/2007/07/06/richard-m-stallman-co...

My favorite part:

>Copyright Now

>Now with digital data and computer networks, it much easier for us to copy and manipulate information

> Digital technology has changed the effect of copyright law

> Copyright used to be a power that was:

> wielded by authors

> over publishers

> to yield benefits to the public

> Now it’s a power that is:

> wielded by publishers

> to punish the public

> in the name of the authors

> Now the public wants to copy and share — what would a democratic government do?

reply

zackees | karma -19 | avg karma -0.04 · 2022-11-17 15:19:59

insanitybit | karma 4243 | avg karma 2.69 · 2022-11-17 15:20:54

I don't really care, it's a trivial algorithm that I would have written virtually identically.

throwaway675309 | karma 1570 | avg karma 2.06 · 2022-11-17 15:40:40

Nothing has been decided in a court of law so saying that it's "illegal" is disingenuous.

Even if it's remarkably similar to another function from a completely different code base but some of the symbols or variable names or function name has been changed, I would argue that it still falls under fair use, and is sufficiently transformative.

reply

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:21:23

Microsoft's own FAQ suggests it's on users of Copilot to avoid infringing, and that without a clue of where and how the suggestions came to be.

boxed | karma 2915 | avg karma 1.94 · 2022-11-20 13:25:49

> Nothing has been decided in a court of law so saying that it's "illegal" is disingenuous.

There are plenty of examples by now of big chunks of code lifted verbatim but without attribution. Pretty clear cut stuff.

> Even if it's remarkably similar to another function from a completely different code base but some of the symbols or variable names or function name has been changed, I would argue that it still falls under fair use, and is sufficiently transformative.

That's BS on many levels. Changing variable names doesn't make copyright go away. It's just trying to hide your violation of it.

I am pretty vocally against copyright, but let's not kid ourselves about the morality of this. No attribution is immoral.

reply

caseydm | karma 77 | avg karma 2.57 · 2022-11-17 14:22:12

I was a naysayer but find copilot makes me more productive. Especially at writing tests. It's very good at recognizing patterns in your own work, and completing an entire test based on the function name.

nsxwolf | karma 12014 | avg karma 3.2 · 2022-11-17 16:11:11

I tried to do this and I couldn't figure it out. I never got the sense that it knew anything about the code I had written, just that it was dreaming stuff up from its training set.

cromka | karma 1845 | avg karma 2.46 · 2022-11-17 14:31:24

> Same thing with copilot, of course its going to write buggy/insecure code, but instead of going to stackoverflow for a snippet its suggested in my IDE and with my current context.

copilot actually can have the benefit here of being able to retroactively mark some snippet as insecure, if it gets flagged as such by the moderators. Any user who used it could get automatic notification.

reply

no_butterscotch | karma 646 | avg karma 2.85 · 2022-11-17 13:25:34

It isn't worth price, I was in the beta and thought it was good. But I'm hoping a better alternative that's cheaper comes about.

eloff | karma 14985 | avg karma 3.31 · 2022-11-17 13:33:41

Do you make less than minimum wage? Because even at minimum wage it saves me enough time a month to pay for itself. In my opinion it has a positive ROI after a single day.

samiam_iam | karma -14 | avg karma -0.54 · 2022-11-17 13:26:11

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:58:24

mring33621 | karma 1421 | avg karma 3.49 · 2022-11-17 13:26:12

1) Starting off, I support AI/ML-based code generation/completion. I would be very happy for the day when I can figuratively wave my hand and get 80-90% of what I need.

2) It might be fair to allow authors to submit repos, along with some sort of 'proof of ownership' to Copilot, in order to exclude them from the training set. There might have to be an documented (agreed-upon?) schedule for 'retraining', in order for the exclusion list to take effect in a timely manner.

3) Or just allow authors to add a robots.txt to their repos, which specifies rules for training.

Just a few thoughts...

reply

VBprogrammer | karma 7432 | avg karma 3.23 · 2022-11-17 13:32:03

Pushing the responsibility onto copyright owners rather than GitHub / Microsoft / Copilot seems unreasonable. I'm all for AI being used like this but it also needs to come with some checks and balances to ensure it's not just regurgitation copyright code.

mring33621 | karma 1421 | avg karma 3.49 · 2022-11-17 13:38:31

OK, then just use existing copyright licensing:

If a permissive, biz-friendly license (Apache 2.0, maybe others) is found in a given Repo, then it can be used in training set

Otherwise, the repo cannot be used in a training set

reply

mbreese | karma 12873 | avg karma 3.45 · 2022-11-17 13:51:35

And then every snippet ever created with that trained data would have to include an acknowledgement for every repository included in the training set.

The LICENSE file would be longer than the rest of the code.

(FWIW, I agree with you theoretically, but practically it's hard to get your head around what the ramifications of that would mean)

reply

leni536 | karma 3944 | avg karma 2.53 · 2022-11-17 14:18:42

Many permissive licenses (including Apache 2.0) require attribution.

coredog64 | karma 3819 | avg karma 2.1 · 2022-11-17 15:27:10

If Joe Bag’O’Donuts copies and pastes LGPL code into his own personal repository that has MIT license attached, is it safe for Copilot to train on it?

I’m really of the opinion that MS needs to document the training set and include a high bar for inclusion of additional repos.

reply

leni536 | karma 3944 | avg karma 2.53 · 2022-11-17 14:17:34

Re 2: So a DMCA notice?

freefaler | karma 458 | avg karma 2.92 · 2022-11-17 13:26:28

There is some legal risk, but what percent of code you write is potentialy affected by audits before you sell it? So you're trading as a single developer real productivity gain and as a company lower costs for a potential "liability" when you're selling your company. Looks like a good bet. A lot of code will be thrown out or never be sold to anyone.

A4ET8a8uTh0 | karma 5530 | avg karma 1.66 · 2022-11-17 13:29:20

There is a risk, but the legal risk to individual users is yet to be decided.

What I think is more concerning is that copilot is an extension of effectively automatic copying stuff from stack overflow with even lesser understanding of what the code does by the prompt writer.

Do not get me wrong. I absolutely see the benefits, but the risk listed in the article seems less material than a further general decline in code quality. "Built by a human" may need to end up being a thing same way "organic" became a part of daily vocabulary.

reply

nottorp | karma 9517 | avg karma 1.87 · 2022-11-17 16:39:15

The problem is, all those people supporting Copilot in this thread can actually write said code without Copilot's help. Namely, they know what they're doing and the tool just saves them some typing.

What happens when this extends to the "specialists" that blindly copy code off Stack Overflow? What happens when this becomes part of learning to program? Will it be as useful for producing working, efficient code when used by people who don't know what they're doing?

reply

everyone | karma 1897 | avg karma 1.38 · 2022-11-17 13:31:35

I have not used it, but I don't understand how copilot could be useful. As a game programmer I don't spend much time actually writing final code. Most of my time is spent working stuff out on paper or writing little tests which I will discard.

In general I want to write as little code as possible as more code = more problems. The code I do write I want to put great care and craft into in order to keep it maintainable. Giving up any of my agency in this critical area seems like a terrible idea to me.

Something that will help me write more code, or write code faster is of no benefit to me.

reply

timojeajea | karma 10 | avg karma 2.0 · 2022-11-17 13:45:38

I think you need to try it if you want to understand how it can be useful. I also tend to write as little code as possible. Since I started using Copilot, I don't write more code nor less code. I write the exact same code I would have written without Copilot, I'm just 25% more productive with it.

everyone | karma 1897 | avg karma 1.38 · 2022-11-17 14:39:36

Are you a webdev? Cus I have been purely a game dev my whole career. I never wrote a single web-app until very recently when I learned some web frameworks to make simple backends for hobby projects of mine in my spare time.. I was kinda shocked how much boilerplate there is and how proscriptive the web frameworks are (I have done some node.js and asp.net) Also for non typed or compiled langauges like javascript the IDE support and autocomplete seems almost non-existent compared to what I am used to. I would imagine something like autopilot would be more useful in that context.

goosesanta | karma 7 | avg karma 0.7 · 2022-11-17 15:51:05

I'm in the same boat exactly. Some people are saying they're more productive with it but all I can ask is 'howwww!?'

What's odd is that I'm noticing almost every single report of it being useful is from someone who is anti code licenses. Or rather, not that they're philosophically opposed, but they disregard licenses altogether because it benefits them and they can't be stopped. I've seen so few reports of usefulness coupled with legal or moral skepticism.

reply

thwayunion | karma 2369 | avg karma 4.32 · 2022-11-17 13:43:51

Context: Kolide just launched a "GitHub Copilot Check" which you can get (along with other features) for $7/device/month. The article is marketing -- an attempt to induce demand among CTOs for an already developed product.

That said: I generally agree with the assessment. Github should at the very least be telling users when it is generating code that they trained on. Until it does that, it's kind of dangerous to use. The security stuff is imo more of a red herring.

But the more important point is that you can just wait a year and hire a consultant to build a better product (for you) at pretty low cost. Within a year, any organization with a non-trivial number of developers will have the option of hosting their own model trained on The Stack (all permissively licensed) and fine-tuning it on their internal code or their chosen stack. That's probably the best path forward for most organizations. If you can afford $7/dev/month for Slack-integrated nannybots you can definitely afford to pay a consultant/contractor to setup a custom model and get the best of both worlds -- not giving MSFT your company's IP while also improving your dev's productivity and happiness beyond what a generic product could deliver.

reply

cdolan | karma 1227 | avg karma 3.37 · 2022-11-17 15:48:54

I usually complain about "thought pieces" that push a product at the end.

But now I realize I like that a lot more than being aware that the article I'm reading is going to push me to take an action (start a discussion with my team) and a probable outcome is "enforce no Co Pilot on company machines".

Sneaky! Good catch. Article should have a disclaimer at the bottom

reply

ugh123 | karma 1765 | avg karma 2.5 · 2022-11-17 14:10:30

I'm really getting tired of lawyers, and collectively our "inner-lawyer", poo-pooing this merely for licensing and GPL issues, neither of which have any practical implication on anything a software engineer does.

All this "controversy" around Copilot just reeks of a kind of technological "social justice" that most people didn't sign up for but seem happy to sit, watch, and commiserate on.

reply

goosesanta | karma 7 | avg karma 0.7 · 2022-11-17 15:39:27

I very much assert that the legal, economic, and social context in which a programmer operates has a great impact on what the programmer produces. We established the licenses we have for good reasons. Licenses alter all of the above variables. We are not simply code production machines. We make code for reasons.

You are free to view yourself as a code production machine, where what you produce is independent of the situation before and after you make anything, but many of us would like to take ownership and action on the legal, economic, and social planes with our work.

reply

GoOnThenDoTell | karma 747 | avg karma 2.36 · 2022-11-17 19:26:37

> poo-pooing this merely for licensing and GPL issues

People want to use it, but are extremely worried about getting in hot water for using it. Thats no idle concern.

It’s very reasonable to ban its use within a company given the legal limbo.

reply

ugh123 | karma 1765 | avg karma 2.5 · 2022-11-17 23:14:02

>given the legal limbo

Exactly.

reply

plgonzalezrx8 | karma 102 | avg karma 1.73 · 2022-11-17 14:33:40

How to make it to the front page in any tech forum:

Step 1: "GitHub Copilot Bad.... amirite!>"

Snark aside, most of these articles miss the mark to the point where they seem like the author is tech illiterate and is just parroting soundbites from others' opinions.

reply

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:53:32

Can you point to any specific examples that make you doubt the tech literacy of the author or similar articles? Perhaps some discrepancies among their points or unsupported conclusions?

TacticalCoder | karma 9117 | avg karma 3.7 · 2022-11-17 14:39:57

I saw there was some (unofficial) package for Emacs, reusing some vim Copilot integration. Anyone here tried Emacs+Copilot yet? Is it working fine? Out of curiosity I'd like to try it and, who knows...

Also: does Copilot work for Clojure and is it any useful for Clojure?

reply

eloff | karma 14985 | avg karma 3.31 · 2022-11-17 14:56:41

Please don't use copilot, decide it's not worth the risk for your company. In the great competition that is the labor market, copilot is giving me a leg up on everyone who isn't using it. It's the biggest single tool based improvement to my productivity since JetBrains.

smcleod | karma 8476 | avg karma 6.43 · 2022-11-17 16:59:21

I was sort of thinking the same thing - it's been such a positive impact on my productivity and time. If other people don't want to use it - don't, but you're not going to stop me and it's only going to get better as more competition arises and we finally have decent on-device options.

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:43:38

> ... but you're not going to stop me

Take care making public statements like this if your work is highly attributable / traceable.

reply

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:44:46

What do you think of folks who secretly outsource their programming work? It certainly gives them a leg up in the market.

eloff | karma 14985 | avg karma 3.31 · 2022-11-18 06:27:55

I don't think it does. The people who are doing that are probably not that good to begin with. The people they outsource to are not that good. And then you have to communicate requirements to them, and manage them. I wouldn't be more productive doing that unless I had a whole team behind me, and then I'd be a manager, not a programmer anymore.

I think I'm deliberately ignoring the tortured argument you're trying to make that copliot is similarly unethical - which is just ridiculous. It deserves to be ignored.

reply

iio7 | karma 1110 | avg karma 8.35 · 2022-11-17 15:00:09

There is a major difference between the help you can get from an IDE or editor with a language server running in the background, and then GitHub Copilot stealing away other peoples code.

I sincerely hope Microsoft looses this law suit.

reply

tester756 | karma 3905 | avg karma 1.96 · 2022-11-17 16:19:23

I'd rather have some "middle-ground" solution instead of losing such a tool

I don't see anything wrong with "stealing" code that was meant to be public

Banning it brings no value in compare to those tools.

Also, how is that different from Google's scraping whole internet?

reply

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:55:06

What if the "meant to be public" was decided because there could be strings attached, even if only to require attribution?

ColonelPhantom | karma 559 | avg karma 1.81 · 2022-11-18 00:41:40

> Also, how is that different from Google's scraping whole internet?

Many reasons: Google Search provides sources, it links to your website. Copilot only gives the content. Google also doesn't suggest including its search results in your product verbatim.

reply

tester756 | karma 3905 | avg karma 1.96 · 2022-11-18 01:54:30

Thus, if Copilot showed original link to the code, then it'd be fine?

ColonelPhantom | karma 559 | avg karma 1.81 · 2022-11-18 12:08:18

Sure; companies would expect their employees to check the license. Remember that a lot of them consider GPL (especially AGPL) software "radioactive", so it will still effectively dissuade them.

However, it is likely that many engineers will skip checking their sources. I guess for this to work Copilot should include the attribution automatically in a comment.

The real problem is that doing this is not possible for Copilot, because the tool itself does not know the source.

reply

throwaway675309 | karma 1570 | avg karma 2.06 · 2022-11-17 22:16:27

They won't. They have both the financial resources and honestly probably a legitimate claim that the model produces a sufficiently transformative result that would be considered fair use.

Even if the ruling did go against them, it would likely still be acceptable to train the models since it would be the usage that would be legally suspect.

Which means that in five years anyone will be capable of running these models entirely off-line on their personal machines and no one will ever know the difference.

reply

charcircuit | karma 2002 | avg karma 0.38 · 2022-11-17 15:10:41

This article makes a big mistake. It assumes copyright infringement is extremely bad and would never be worth doing. In practice when have people been sued over misusing open source software? You most likely won't be caught. And even if you are you can rewrite the code / give attribution then. Even if you do end up having to pay damages, the productivity increase for your company using copilot may be worth the damages.

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:59:43

After Google vs Oracle damages could easily have two or three commas.

charcircuit | karma 2002 | avg karma 0.38 · 2022-11-17 21:10:00

I don't understand what you mean. In Google vs Oracle the copyright infringement for the copied code was settled for $0. Open source projects are not as sue happy as Oracle is.

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 22:51:49

Last I heard it was tens of millions. I'll have to look again.

> Open source projects are not as sue happy as Oracle is.

Comes across as punching down. Like it's OK to steal from the mom and pop stores, they're too poor and overworked to do anything about it.

reply

charcircuit | karma 2002 | avg karma 0.38 · 2022-11-18 02:50:01

>Like it's OK to steal from the mom and pop stores, they're too poor and overworked to do anything about it.

An action that is worth the risk is not neccesarily morally or ethically correct.

The article only cared about money and security. Feeling good about your actions wasn't a concern.

reply

abelaer | karma 35 | avg karma 1.4 · 2022-11-17 15:14:01

I have been writing my PhD thesis in VSCode with copilot enabled, and it it absurdly good at suggestions in Latex, from generating tables to writing whole paragraphs of text in the discussion.

consumer451 | karma 7639 | avg karma 3.13 · 2022-11-17 17:52:30

I wonder if that could trigger any plagiarism checkers. Not that I have any idea what I'm talking about as far as standard operating procedures in academia.

renewiltord | karma 12072 | avg karma 1.39 · 2022-11-17 15:21:54

I'm going to keep using it. You won't stop me. You won't catch me. And I just need to read the next five tokens to know whether it's right.

rattlesnakedave | karma 733 | avg karma 2.78 · 2022-11-17 15:33:22

Reading about FOSS copyright is so exhausting. I find no meaningful distinction between reading code and learning from it, vs feeding it into a model. I’ve heard the “it spits out foss code verbatim” argument, and I really don’t buy that. I’ve never seen it. AI assisted software tooling is so powerful we really should consider the social benefits ahead of what is part of our existing legal framework.

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:39:04

> X is so powerful we really should consider the social benefits ahead of what is part of our existing legal framework.

Laws can be slow to catch up, that's a feature. The legislature and courts exist for a reason. You can argue they're moving so slow it's better to ignore them. But that introduces a very real liability, at least until you can convince a court or elect representatives.

reply

dpedu | karma 1201 | avg karma 2.82 · 2022-11-17 15:34:14

Like many folks here, I can write and read a variety of different programming languages. Some I've been using for a long time and know very well and some I seldom use but retain the basics.

I don't use Copilot when writing languages I am very comfortable with because I'd rather write code that I completely understand. Or at least, understand to the best of my ability. I find it easier to consider edge cases and side effects when writing original code. Or at least, compared to reading someone else's that was ripped from a project you don't even know the goals of. I don't buy that Copilot improves productivity for this reason as well.

I also avoid using Copilot when writing in languages I am unfamiliar with because I feel like it's robbing me of a learning experience. Or robbing me of repetition that improves my memory of how to do various things in the language.

I don't know. Copilot is certainly impressive but there are too many questions - what I've mentioned and the legal ones in the OP. But perhaps that is a good thing? It is a new angle on copyright that we're going to have to answer one way or another. In programming and other fields.

reply

kace91 | karma 4523 | avg karma 7.06 · 2022-11-17 18:22:37

Just to give you my two cents, for me it improves productivity massively because, while suggestions for actual code are very hit and miss, it is able to get me 90% of the way there writing tests with just the name of the test and context as prompts.

This might not be that big a percentage of my actual work, but in terms of motivation it enables me to work using TDD without the friction of writing boilerplate, which in turn makes programming much more fun.

Also, and this is a big one, you can directly ask questions and it replies. I find that fascinating.

This morning I asked him “why didn’t you test for [specific thing]?” And it replied “because I don’t know how to properly mock [name of a library I was using]”.

Yesterday, while bored in a meeting, I asked copilot whether a coworker’s proposal for an OKR was good and it replied “it’s ok, but keep in mind that it’s a lagging metric”. It’s scary.

reply

dpedu | karma 1201 | avg karma 2.82 · 2022-11-17 20:23:15

> Also, and this is a big one, you can directly ask questions and it replies.

Now that I didn't know!

reply

haolez | karma 3647 | avg karma 2.78 · 2022-11-17 15:37:53

I haven't used it yet. I believe when people say that it's the future of development and that every dev will have to use it or be left behind, but I can't fathom how people are comfortable sending every iteration of their code to a big tech corporation. I can't wait to see the day where we can run such solutions in our personal computers (or personal cloud servers), but I feel that, in 2022, this type of tool is not yet worth the risk. I hope this is just a temporary obstacle in our way to our future AI-assisted programming.

PUSH_AX | karma 978 | avg karma 2.79 · 2022-11-17 15:40:08

> but I can't fathom how people are comfortable sending every iteration of their code to a big tech corporation.

I assume you only ever use self hosted source control then? And then where is it hosted?

reply

TillE | karma 15387 | avg karma 3.69 · 2022-11-17 16:05:30

For private business code, yes? Of course.

It's very easy to host your own GitLab server if you need a fancy web interface, and even easier to just put Git anywhere if you don't.

reply

haolez | karma 3647 | avg karma 2.78 · 2022-11-18 08:04:19

Version control is probably the easiest thing to self host for both individuals and enterprises, but that's still very different from Copilot. A Git service provider will have access only to what I choose to push, but Copilot gets access to what I'm TYPING. It's scary, at least for me.

endisneigh | karma 11792 | avg karma 2.65 · 2022-11-17 15:42:57

If copilot is fine, then software licenses are meaningless imo

donatj | karma 17336 | avg karma 4.57 · 2022-11-17 15:48:17

A couple days ago I wrote a new class. Went to write a unit test, it wrote several hundred lines of functioning unit test for me. It's worth it.

hipsterstal1n | karma 60 | avg karma 1.43 · 2022-11-17 15:54:52

Programmer: Uploads code to Github for the public to see / use Github: Uses code uploaded by programmers to learn and make other code better Programmer: NO FAIR! My code can only be used the way I want it to be and my code is absolutely unique and no one else has coded something like it

Acen | karma 40 | avg karma 1.03 · 2022-11-17 16:11:36

I think it kind of flows into two trains of thought in the against category. First off, that some people are worried about copywrited, private stuff being included in the training data. I've not read up on copilot recently, so not sure if this is a reasonable thing to be worried about or not.

The other, is that people might be using Github to share stuff they've come up with other developers, but having an AI parse that information means that there's a disconnect between giver and receiver. It removes a chunk of the feedback loop being possible, which makes it so rather than it being a community of developers, it becomes something more akin to content creators and lurkers. That's not necessarily a bad thing, due to it opening up the sheer number of possible usages that end up using something. But it would minimize community feedback.

reply

ColonelPhantom | karma 559 | avg karma 1.81 · 2022-11-18 00:38:47

Many FOSS developers use copyleft licenses to ensure that only FOSS benefits from their code.

Copilot suggesting it for inclusion into proprietary codebases is then effectively whitewashing that GPL code.

Copilot also does not provide attribution, which is a legal requirement of tons of permissive licenses, including MIT and BSD.

reply

khiqxj | karma 3 | avg karma 0.05 · 2022-11-17 15:58:06

you still on about copyright? what about the fact that it will just add vulns and bugs to your code? or is the industry so bad at this point that a gimmicky AI tool can do better

Kiro | karma 10888 | avg karma 1.51 · 2022-11-17 16:07:03

I would say the risk is minimal. You need to bait Copilot really hard for it to produce anything coherent from existing code. That's simply not how you use it.

Regardless, the risk need to be really big for me to stop using it. It's such an essential tool for me now that I get shocked how crippled I feel when internet stops working and I realize how much I depend on it.

reply

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 19:01:16

Then perhaps it's better not to depend on such a tool.

Kiro | karma 10888 | avg karma 1.51 · 2022-11-18 16:41:19

No way. I would rather stop programming altogether than go back to life before Copilot. You've already lost this one.

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-18 18:03:27

What about the tool is so intoxicating you'd give up a lucrative career if you lost it?

ralph84 | karma 3717 | avg karma 5.27 · 2022-11-17 16:51:08

"You might get sued if you use this software you paid for" is already covered via an indemnification clause in any reasonable enterprise software license agreement. I'm sure Microsoft/GitHub will be no different in indemnifying their customers who purchase Copilot.

paulryanrogers | karma 9107 | avg karma 1.68 · 2022-11-17 18:50:06

> You are responsible for the code you write with GitHub Copilot’s help. We recommend that you carefully test, review, and vet the code before pushing it to production, as you would with any code you write that incorporates material you did not independently originate.

Looks like Microsoft says burden is on CoPilot users to 'vet' the code.

reply