Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
GitHub Copilot isn't worth the risk (www.kolide.com) similar stories update story
179 points by terracatta | karma 1201 | avg karma 13.97 2022-11-17 11:44:22 | hide | past | favorite | 215 comments



view as:

People are way to attached to single function examples. I'm struggling to find any example that actually rise up to the requested "originality, creativity, and fixation" for copyright to apply.

Just because something looks similar or is even identical doesn't mean copyright applies.


You might want to take a look at some of the pieces of code examined in Google vs Oracle before you decide small and obvious cannot bear copyright in the way that you think it does.

That horrifying back and forth showed that lawyers can consider very small and obvious fragments of code to be absolutely copyrightable. And that it went on for nearly a decade, should tell you that none of this is simple.


Can you give an example? Are they trademarking `for` loops or something?

https://guides.lib.umich.edu/c.php?g=791114&p=5747565

"Google also copied the nine-line rangeCheck function in its implementing code"

Comparison between the two, discussed back in 2012:

https://news.ycombinator.com/item?id=3940683


Google v. Oracle did not decide whether such code is copyrightable. The decision says that it doesn't matter because it would be fair use if it were copyrightable.

Whether or not a court would ultimately decide an instance of CoPilot code is copyright infringement or not isn't the main issue in my opinion. The creating opportunities for other people to sue you will be much more damaging. Lawsuits that you win will also be very expensive and its not guaranteed you'll get lawyers fees paid for by the losing party.

courts have repeatedly upheld that only humans can own copyright. nothing an AI generates is considered a "work" in copyright law, so they can't be violating works or derivative works. the onus falls on the copilot operator (one might say "the pilot") for purposes of copyright.

This feels like another drama in the style of SCO v. Linux. Lots of FUD, little to nothing for end-users to actually worry about.

Yup. There is no chance in hell of them coming after USERS of copilot.

If a company's code is audited (internally or externally), and GPL code is found, you can bet your ass the dev who committed that GPLed code will get a stern talking too, and the company will have to re-write that code.

And that's just for GPL code. Code not under an OSS license could get way worse.


The structural completions are way more useful than the entire function completions, even in IntelliJ, where autocomplete is already extremely high quality.

The part that I find unsettling when using Copilot is the risk that credentials or secrets embedded in the code, or being edited in (.gitignore'd) config files, are being sent off to Microsoft for AI-munging and possible human review for improvements to the model.


You shouldn't have any credentials in your git repos anyway. GitHub will already scan your repos and alert you if it thinks there are any credentials in their.

You've never temporarily put a key into a file while testing? Or accidentally pasted one for a second then deleted it? Can you say the same for your entire team or company?

Since Copilot is constantly making new suggestions, a momentary entry is all it takes.


Copilot doesn't retrain on data generated by you in the moment; so I don't see why this is an issue unless you push the files - with the keys - to github.

The model is evaluated on the server, using the content of your files.

credentials should never be committed. By the time you're ready to commit code, you should be reading from the environment or a config outside of the codebase, or at least .gitignore'd

Once that key is in your git history, it's in the history. You might be able to edit it, but it's going to be a nightmare to do it.


I'm not sure why you're referring to committed. The model is being evaluated on the server, with content you haven't yet committed.

> The structural completions are way more useful than the entire function completions, even in IntelliJ, where autocomplete is already extremely high quality.

I needed to run a comparison over a window of a numpy array, and given the sheer size of my data, I needed it to be fast and efficient, which means vectorized operations with minimal python interaction. Copilot figured a solution that is orders of magnitude faster than what I could conjure up in 10 minutes, most of which I'd spent searching for similar solutions in SO.


It's interesting to consider how you might prevent training using a license without being too restrictive.

Here is an example of a license that attempts to directly prohibit training. The problem is that you can imagine such software can't be used in any part of a system that might be used for training or inference (in the OS, for example). Somehow you need to additionally specify that the software is used directly... But how, what does that mean? This is left as an exercise for the reader and I hope someone can write something better:

  The No-AI 3-Clause License
This is the BSD 2-Clause License, unmodified except for the addition of a third clause. The intention of the third clause is to prohibit, e.g., use in the training of language models. The intention of the third clause is also to prohibit, e.g., use during language model inference. Such language models are used commercially to aggregate and interpolate intellectual property. This is performed with no acknowledgement of authorship or lineage, no attribution or citation. In effect, the intellectual property used to train such models becomes anonymous common property. The social rewards (e.g., credit, respect) that often motivate open source work are undermined.

  License Text:
https://bugfix-66.com/7a82559a13b39c7fa404320c14f47ce0c304fa...

What about fair use? (both in the copying made for training itself and the resulting output from the service)

We are witnessing a monstrous perversion of "fair use" and the greatest theft of intellectual property in human history.

Do you measure IP's value using the amount of work/effort that was put into creating it, or only the end result?

Currently US copyright law only cares about the end result. Effort has no meaning or bearing in any legal analysis of copyright matters.


Copyright infringement trials are tried in the infringer's jurisdiction.

This is such a Luddite behavior.

How much hubris we have as a species to think that our professions will endure until the end of the stars. To think that the software we write will be eternal.

The thing that we do now is no different than spinning cotton.

I'd be shocked if the total duration of human-authored programming lasted more than a hundred years.

I'll also wager that in thirty years, "we'll" write more software in any given year than all of history up until that point.


I'm all on board if the Microsoft's of the world are. But they choose to train their AI on OSS code and not their own codebase. So clearly they think similarly to the parent, they just want you to forget about that part when it suits them.

If we pass laws restricting the training on copyrighted information, the only organizations that will be able to train will be institutional.

Microsoft would benefit from restriction. Not us.


would you pay for a product trained on say, the MS Teams, Sharepoint or Skype codebases?

no, and no-one else would either


This is the BSD 2-Clause License:

    1. Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in
       the documentation and/or other materials provided with the
       distribution.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Presumably, as long as GitHub Copilot:

a) fails to respect these itself, or

b) present the user that is going to use its output verbatim or produce derivative code from it so that the user can respect these

Then GitHub Copilot is either in violation of the license or a tool assisting in such a violation by stripping the license away†.

From TFA:

> David Heinemeier Hansson, creator of Ruby on Rails, argues that the backlash against Copilot runs contrary to the whole spirit of open source. Copilot is “exactly the kind of collaborative, innovative breakthrough that I’m thrilled to see any open source code that I put into the world used to enable,” he writes. “Isn’t this partly why we share our code to begin with? To enable others to remix, reuse, and regenerate with?”

I don't mean to disrespect DHH, but the "spirit of open source" isn't to wildly share code around as if it were public domain, because it is not, an author gets to choose within which framework their code gets to be used and modified††, otherwise one would have used public domain as a non-license + WTFPL for those jurisdictions where one can't relinquish their own creation into public domain.

† depending on whether the "IA"/Microsoft can be held liable of the automated derivative, or if the end user is.

†† cue GPL vs MIT/BSD


The spirit of this is good, but the implementation is garbage - you need a lawyer or team of lawyers to do this right. You grandstand and soapbox in this weakly written paragraph, and it hurts the whole thing. You discuss social rewards, intentions, etc. This just reads like a stallman-esque tirade

None

Wow. That's aggressive.

You previously said:

"I work at the most important company in the "AI" industry, a company you hear about every day.

I write GPU kernels for transformers and convolutions. You probably use my BLAS kernels in your networks."

It's pretty easy for people to figure out it's NVIDIA. You probably work on the cuBLAS library.

All the personal attacks I've seen come from you, with your snarky comments.

All the people that responded to you did so in good faith, trying to engage in honest conversation.


Hmm, I am going to overlook the threats and strange response - I think what you have here, in this license you are trying to push, is a good thing. I was giving it feedback. Hire a lawyer, strip away the opinions, and you're cooking. I wish you luck with it.

Seems this article completely misses the benefits of copilot. Its a massive step forward in productivity. For me, its about suggesting proper syntax across the various libraries we use. It really does cut time by 10s of percent.

I don't buy the argument that the risk of a yet-to-be-litigated case against a different company, who will certainly fight this hard; is greater than the productivity gain of using copilot.

Additionally, the security argument feels ridiculous to me. We lift code examples from gists and stackoverflow ALL THE TIME! But any good dev doesn't just paste it in and go, instead we review the code snippet to ensure its secure. Same thing with copilot, of course its going to write buggy/insecure code, but instead of going to stackoverflow for a snippet its suggested in my IDE and with my current context.


> I don't buy the argument that the risk... is greater than the productivity gain of using copilot.

How does your company's general counsel feel?

This article is written at CTOs, not engineers.


I suspect prohibiting Copilot will just become another checkbox on compliance security questionnaires. The fact that Kolide can detect it and that Kolide can feed compliance suites like Vanta or SecureFrame means the infrastructure is already there. It's not only your lawyers that want these guarantees, it's often your customers.

We don't have GC (too small), so caveat my take with the fact that I'm writing from a smaller companies perspective.

May be different for a larger, value-preserving company who would face more scrutiny.

That being said, I still find it extremely unlikely that there would be legal ramifications from using a product being pushed by one of the largest software companies in the world. Why go for a user and not Microsoft themselves?


None

> Why go for a user and not Microsoft themselves?

1) the user likely doesn’t have the legal resources of Microsoft.

2) the user is the one committing the infringement.

If Microsoft stood behind this they could offer to indemnify users against lawsuits relating to CoPilot usage, but they don’t.


> That being said, I still find it extremely unlikely that there would be legal ramifications from using a product being pushed by one of the largest software companies in the world.

Microsoft is explicitly saying it's your responsibility to check if the Copilot's output that you add to your codebase is not infringing on anyone's license.

Also, it's actually a complex legal question if Copilot itself is infringing anyone's copyright. But, there is no doubt whatsoever that you don't have the right to distribute someone else's copyrighted code (without a license) just because it was produced by Copilot and not manually copied by you. And it is also very clear that Copilot can occasionally generate larger pieces of someone else's code.

Edit: fixed typos


> Microsoft is explicitly saying it's your responsibility to check if the Copilot's output that you ads to your codebase is infringing on anyone's license.

(Never used copilot)

Wow, this is kinda shocking IMO. It kind of negates the entire value proposition of the tool.

How am I supposed to find out whether a snippet is infringing? Should I paste it into google or something? Shouldn’t Copilot be the one to tell me if a snippet too-closely matches some existing code it learned from?

If MS is indeed saying this, I feel like it’s something they put in the agreement to cover their own asses. There’s no way they’d really expect everyone to do this sort of thing. Moreover I don’t feel that’s a very strong defense MS could use in court if somebody decides to go after MS for making the tool that makes infringement so easy. It sounds like one of those “wink wink” types of clauses that they know full well nobody will follow.


From the official FAQ [0]:

> Other than the filter, what other measures can I take to assess code suggested by GitHub Copilot?

> You should take the same precautions as you would with any code you write that uses material you did not independently originate. These include rigorous testing, IP scanning [emphasis mine], and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it.

I think lots of companies do run tools such as BlackDuck and others to scan their entire code base and ensure (or at least have some ass-covering) that there is no accidental copyright infringement.

[0] https://github.com/features/copilot#other-than-the-filter-wh...


How much of what you save by using Copilot will then be spent on BlackDuck licenses?

Capex vs opex, huge difference

While the cost to programmers' sanity of running things like BD is immeasurable in my estimation, if you are already doing it, doing it for Copilot code shouldn't add any extra cost, unless Copilot is actually constantly spewing copyrighted code.

> While the cost to programmers' sanity of running things like BD is immeasurable in my estimation

Can you clarify? In my experience, source scan is just another job in one's build pipeline. And I've only seen it fail when it does, in fact, detect a new component (or a license change in the existing component) - because at that point you have to do the legal dance for third-party notices etc. But the latter part something you have to do either way, tools or no tools.


Source scan is indeed not a problem. Scanning all the binary blobs is where things go wrong, on two aspects.

For 1, there are quite a few false positives, especially if you use commercial 3rd parties as well. For example, I had a UI component recognized as some obscure academic micro kernel!? Investigating, we found that happened because that micro kernel project was using the same commercial UI component somewhere (probably under some academic license), and there repo was just where BD had seen this JS code before.

For a second, and much more common and annoying one, at least in BD in my company, you have to add explanations to each individual identified 3rd party package that uses something like GPL to affirm that it is being used in a way that complies with a license. If you're doing something like distributing a Linux VM, that means hundreds of packages that are part of the distribution. This work has to be done manually, which means entering the same copy/paste text in hundreds of places in the atrociosly slow BD UI.


Does Microsoft let their developers use it? Say, when working on Windows? If not, I'd say the very vendor of the software considers it radioactive, so I'll keep treating it that way, too.

Is 'what would a microsft dev do' really the bar we want to live by, though?

No but it's not a bad litmus test in this situation.

In this case, yes, of course—I don't really get your objection. If their own legal counsel is advising them not to let their developers use their own product over legal concerns (and what else could be the reason?) that would be a pretty good argument against anyone else using it.

Nb. I don't know whether they do or do not, in fact, let their developers use it.


Say what you want about microsoft - they've got some of the best lawyers in the world on this kind of stuff. If they're not doing it they either don't trust the tech or don't trust the law.

> It really does cut time by 10s of percent.

I used it for about a month. It gave me a few false positive that really burned me - it's not worth the risk. Maybe future versions would be better.


What're the examples of false positives?

Agreed it gets things wrong very frequently. But I've found it much easier to use its suggestion as another "input" to writing code.


I've gotten plenty of false positives, but the mistakes turn up in testing and are pretty easy to spot when reviewing the code. Anything more subtle is likely to have been missed when written by hand anyway.

What happened to burn you so badly?


This. If copilot suggests anything more than basic syntax or boilerplate I don’t use it. If it writes code I don’t understand or wouldn’t be able to write myself I won’t use it. Why? Because at the end of the day it’s my code. In what world is a good engineer submitting a PR for coworkers to look over that isn’t their code?

If this is a real issue the solution is not banning yet another tool. It’s education. Teaching engineers how to properly understand code attribution and licenses.


Do you think we'll be writing software 200 years from now?

50? 25?

I'll bet the people spinning cotton thought that would endure forever.

(Sorry if my tone comes across as fervent. I'm excited to be displaced by this, because what follows is the stuff of dreams.)


Whenever I watch Geordi and Data doing something in engineering, they’re often talking to the computer about constructing models and sims and such.

To me this is the most ultimate form of declarative programming. Not that we will all be talking it out, but that we will explain in natural language what we’re after.

It maximizes how much time we spend in the “problem understanding/solving” phase and minimizes the tedium of actually setting up the apparatus.


The invention of the cotton gin simply moved people from spinning cotton to picking cotton. And increased demand for slaves.

I'm not excited to be displaced personally, but I'm also not really worried about being displaced. If displacement is inevitable, I don't see how the average programmer is going to leverage this for the "stuff of dreams". Usually, tech advancements result in a greater consolidation of wealth into the hands of those that already own capital. Recent tech is no exception. Yes, there has been a lot of wealth created for regular people, but we're still working 40+ hour weeks, and earnings have not matched the increase in productivity.

What I am concerned about is that our field is becoming increasingly arcane magic for the younger generations, especially the masses that are being completely and utterly failed by the education system.


I apologize ahead of time for rambling, but I'm with you on this!

In my coworkers and many of the applicants we see, there's a trend of over optimization. The common meme is the 'leet code' interview process.

I suppose the best way I can convey this is... I think there's hyper focus on the mechanics of doing things. Making people not afraid of the code, unaware of the world around it

Abandoning a lot of thought for process. Or even the physical systems it runs on. I recently learned about the term 'mechanical sympathy'

Sometimes it's important to ask if you need the code or system at all!

I know it's not fair to people but I groan any time I see a CS degree


I mean, yes? People will be doing math as long as there are people around to do it. It'll look different, sure. But there will always be problems, and math/programming is problem solving par excellence.

Between 2016 and 2021, I've been of the opinion that I cannot make any reasonable forecast of even vague large-scale social/technological/economic development past 2030, because the trends in technology go all funky around then.

Thanks to recent developments in AI (textual and visual), I no longer feel confident predicting any of those things past about the beginning of 2028.

It's not a singularity, it's an event horizon: https://kitsunesoftware.wordpress.com/2022/09/20/not-a-singu...


These assisted coding systems are tremendously exciting but they are only the analogue of moving from a shovel to a powered excavator; it still needs a trained individual who knows what the final result needs to look like to a fairly high technical level to be effective. So, yes, 25-50 years from now humans will still be be the principal element in writing software.

I don't see a world where programming isn't the last thing to go. We pretty much have a general intelligence when a "programmer" is no longer needed. That doesn't mean programming will look anything like it does today in 200 years but will the profession, doing kinda the sameish thing, still exist? Absolutely!

It's interesting to think about. If programming can be automated away, then you can use that automation to automate away any job in the world that can be automated.

Yeah, in the future there will be only AIs developing apps and AIs using apps.

There won't be apps, actually, they'll do everything programmatically.

And all humans would have been killed by then in an AI doom.


Yes! Exactly.

The article suggests that he wants to know "who wrote the code" if a senior dev he trusts submits a PR. He doesn't want to be surprised that "the AI" wrote some of this code.

But its ALL written by the senior dev. If he trusts that dev, that means that dev has thoroughly read and tested his code! That's the important bit. Remembering proper syntax/imports/nesting levels is the tiniest piece of writing good code. And copilot can take that off our hands.


That's like saying that code copy/pasted from OSS projects on github was "written by the developer". Which is not true.

The speed of your developer and the correctness and test coverage of your code doesn't matter when it comes to license compliance.

And license compliance could cost your company 100x (if not more) the value of your best software developer - especially for the non-OSS licenses.


It was written by the developer. If I write down lyrics I remember I still wrote it. Whether I have the copyright to make money off of it or whether it is trademarked are different things.

You could state they are not the first to write this which would be more correct.


GitHub Copilot has been concretely demonstrated to emit significant chunks of OSS licensed code.

Significant enough that if the license is GPL (which some has been) it will "taint" the entire codebase and license it under GPL. Significant enough to be found by automated OSS audit tools, which would trigger a re-write and education for the developer who committed it.

EDIT:

> If I write down lyrics I remember I still wrote it.

Not from a copyright point of view. The rights to those lyrics belong to the songwriter. It's kinda like photographs. You don't automatically have the right to distribute a photograph of yourself that was taken by someone else.


> Significant enough that if the license is GPL (which some has been) it will "taint" the entire codebase and license it under GPL. Significant enough to be found by automated OSS audit tools, which would trigger a re-write and education for the developer who committed it.

That "significant enough [...] to taint the entire codebase" remains to be decided in court.


Several of the byte-for-byte copies pointed out by open source authors were longer than 20 lines, and contained verbatim comments.

I am not a lawyer, but that's been enough to get people in legal trouble in the US.


> That "significant enough [...] to taint the entire codebase" remains to be decided in court.

I doubt any employer would appreciate being this particular guinea pig because one of their employees wanted to avoid writing some boilerplate.


> That's like saying that code copy/pasted from OSS projects on github was "written by the developer".

I don't think that's what OP is saying. What I think OP is saying (and I agree) is that submitted code is trusted if you trust the source. If you take the person putting code in front of you and ask "Would this person copy someone else's code and submit it as their own" and the answer is "No they would not copy code" then every step that trusted-person took to get to that code is immaterial. Whether they used StackOverflow or Copilot or whatever AI assisted code generating tools do or don't get developed in the future. At the end of the day a good, trustworthy engineer isn't going to use licensed software by "accident"[1].

1. I put "accident" in quotes because it seems so crazy to me that someone would start writing a method "doThing" and then CoPilot spits out a licensed implementation of "doThing" and the engineer would look at it and go "This seems fine."


> every step that trusted-person took to get to that code is immaterial.

Which is, unfortunately, completely useless when it comes to copyright infringement. Trust in the individual will not change the output of an audit for copyrighted code, or the results from said audit.

The only thing that a "trusted" individual can contribute in a copyright infringement investigation is attesting that they did not know that the code they put in the codebase was copyrighted. And all that does is save the company from getting the higher "willful infringement" fines, if it should get that far.

Wilful Infringement Damages: https://www.ce9.uscourts.gov/jury-instructions/node/708


None

> If it writes code I don’t understand or wouldn’t be able to write myself

For me the bar is higher - it's not that I wouldn't understand it, it's that its easier to miss mistakes when reviewing than when writing from scratch. In the same way you may have ignored the typo in this comment and understood what I meant regardless of the mistake. But that doesn't work for programming - a mistake is a mistake and likely matters in edge cases even if it's not immediately obvious.


In Intellij or Visual Studio, syntax suggestion/tab completion are already great. Those technologies - which involve none of the legal risks of Copilot- are a massive step forward in productivity. Copilot does help extend these benefits to other languages that I occasionally dabble in, like Lua and embedded C, though it's clearly better in languages which are better represented in its dataset.

I don't find the natural language comment to buggy algorithm part of Copilot to be particularly useful. I know some people asked to be able to write a "DoWhatIMean(), method, but programmers really only wanted that to auto-expand to "protected virtual void DoWhatIMean() {}" without having to wait 30 seconds to check for a compile error and see if it was protected void virtual or protected virtual void...


> In Intellij or Visual Studio, syntax suggestion/tab completion are already great. Those technologies - which involve none of the legal risks of Copilot- are a massive step forward in productivity. Copilot does help extend these benefits to other languages that I occasionally dabble in, like Lua and embedded C, though it's clearly better in languages which are better represented in its dataset.

Copilot is so much beyond regular autocomplete that it's playing a completely different game.

I've been using it today while writing a recursive descent parser for a new toy language. I built out the AST in a separate module, and implemented a few productions and tests.

For all subsequent tests, I'm able to name the test and ask Copilot to write it. It will write out a snippet in my custom language, the code to parse that snippet, and construct the AST that my parser should be producing, then assert that my output actually does match. It does this with about 80% accuracy. The result is that writing the tests to verify my parser takes easily 25% of the time that it has when I've done this by hand.

In general, this is where I have found Copilot really shines: tests are important but boring and repetitive and so often don't get written. Copilot has a good enough understanding of your code to accurately produce tests based on the method name. So rather than slogging through copy paste for all the edge cases, you can just give it one example and let it extrapolate from there.

It can even fill in gaps in your test coverage: give it a @Test/#[test] as input and it will frequently invent a test case that covers something that no test above it does.


Thing is, for something like an AST parser you want a property test, not a bunch of autogenerated boilerplate.

Generally, if something is boring and repetitive it's probably shouldn't be written, better code generation is rarely a good answer.


Property tests are nice for lots of things, but for an AST parser? You'd basically have to re-implement the parser in order to test the parser, wouldn't you?

I suppose you could test "if I convert the AST back to a string do I get the same result", but that's not actually your goal with an Abstract Syntax Tree. If nothing else the white space should be allowed to change.

What sort of property tests did you have in mind?


You are right that simplified reimplementations make good property tests, but in this case I'd go the other way around: generate an AST, render it a in test case-dependent way (adding whitespace as you said, but also parens etc), inject known faults for a fraction of test cases, and check that the parsed AST is equivalent to the original one or errored out if a fault was injected. Rendering a given AST is usually simpler than parsing it.

> instead we review the code snippet to ensure its secure

Doesn't matter. A developer's speed and test completeness and code quality matter not one whit when it comes to licensing. That 10x developer could mire the company in fines and code re-writes if they include copyrighted code, especially if it's not OSS.


I don't understand how it improves productivity _that_ much. Most of my time isn't actually spent on syntax but rather reading Hacker news and making irrelevant comments.

Using it in practice, the sheer quantity of suggestions (often one for every line) is fatiguing especially when 99% of the time they seem fine.

I posit it becomes increasingly likely over large periods of time over many engineers that severe bug or security issue will be introduced via an AI provided suggestion.

This risk to me is inherently different than the risk accepted that engineers will use bad code from Stack Overflow. Even Stack Overflow has social signals (upvotes, comments) that allow even an inexperienced engineer to quickly estimate quality. The amount of code used by engineers from Stack Overflow or blogs etc, is much smaller.

Github Copilot is constantly recommending things and does not gives you any social signals lower experienced engineers can use to discern quality or correctness. Even worse, these are suggestions that are written by an AI that does not have any self-preserving motivations.


Copilot's default behavior is stupid. You can turn off auto-suggest so that it only recommends something when you prompt it to, and that should really be the default behavior. This would encourage more thoughtful use of the tool, and solve the fatigue problem completely.

In IntelliJ, disabling auto complete just requires clicking on the Copilot icon in the bottom and disabling it. Alt+\ will then trigger a prompt. I know there's a way to do this in VSCode as well, but I don't know how.


> I know there's a way to do this in VSCode as well, but I don't know how.

I dug into this a bit, since I want the same functionality, I found I needed an extension called settings-cycler (https://marketplace.visualstudio.com/items?itemName=hoovercj...) which lets one flip the 'github.copilot.inlineSuggest.enable' setting on and off with a keybind.

Not sure who's in charge of the Copilot extension for VS Code, but if you're out there reading this, the people definitely want this :) Otherwise of course, your tool rocks!


I switched it off and never remember to bother using it. It's obvious why it's enabled by default.

This is a very solid argument. How do we fix that?

THIS is the article I want to read!


I would argue that this kind of problem is going to become less of an issue overtime, since they're going to have to also solve the issue of suggesting code samples from deprecated API versions - it's likely that eventually they'll figure out a similar way to promote more secure types of code in the suggestions based on Stack overflow or other types of ranking systems.

Yes, the will surely improve a lot and also train users to write better prompts and comments. With millions of users accepting suggestions, then fixing them, they get tons of free labeling. If they monitor the execution errors they got another useful signal. If they use an execution environment they could use reinforcement learning, like AlphaGo, to generate more training data.

"I posit it becomes increasingly likely over large periods of time over many engineers that severe bug or security issue will be introduced via an AI provided suggestion."

I'll go one further with the "Co-pilot is stupid."

It's supposed to be artificial intelligence. Why in the eff is it suggesting code with a bug or security issue? Isn't the whole point that it can use that fancy AI to analyze the code and check for those kind of things on top of suggesting code?

Half-baked.


Ah yes, humans are perfect, never make any mistakes. That's why only AI write bugs.

> I posit it becomes increasingly likely over large periods of time over many engineers that severe bug or security issue will be introduced via an AI provided suggestion.

AI can also do code review and documentation helping us reduce the number of bugs. Overall it might actually help.


"...does not gives you any social signals lower experienced engineers can use to discern quality or correctness" is very astute.

I experienced this in practice. I was pairing with an inexperienced engineer who was using Copilot. He was blindly accepting every Copilot suggestion that came up.

When I expressed doubt in the generated code (incorrect logic + unnecessarily complex syntax), he didn't believe me and instead trusted that the AI was right.


As programmers we take pride in being DRY. Copilot is helping us not reinvent the same concept 1000 times. It also makes developers happier, reduces the need to context switch, increases speed and reduces frustration.

> Github Copilot is constantly recommending things

It's only a momentary problem, will be fixed or worked around. And is this a bad thing to get as many suggestions as you could? I think it's ok as long as you can control its verbiage.

> does not gives you any social signals

I don't see any reason it could not report on the number of stars and votes the code has received. It's a problem of similarity search between the generated code and the training set, thus finding attribution and having the ability to check votes and even the license. All doable.

> an AI that does not have any self-preserving motivations

Why touch on that, people have bodies and AIs like Copilot have only training sets. We can explore and do new things, AIs have to watch and learn but never make a move of their own.


>> As programmers we take pride in being DRY. Copilot is helping us not reinvent the same concept 1000 times.

That's what libraries are for.

Copilot is just copy / paste of the code it was trained on.

When the code it was trained on is later discovered to have CVEs, will it automatically patch the pasted code?

With a library, you can update to the patched version. Copilot has no such feature.


> Copilot is just copy / paste of the code it was trained on.

Every time I hear someone say this, I hear "I've never really tried Copilot, but I have an opinion because I saw something on Twitter."

Given the function name for a test and 1-2 examples of tests you've written, Copilot will write the complete test for you, including building complex data structures for the expected value. It correctly uses complex internal APIs that aren't even hosted on GitHub, much less publicly.

Given nothing but an `@Test` annotation, it will actually generate complete tests that cover cases you haven't yet covered.

There are all kinds of possible attacks on Copilot. If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

EDIT: There's also this fun Copilot use I stumbled across, which I dare you to find in the training data:

    /**
    Given this text:
 
    Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

    Fill in a JSON structure with my name, how much money I had, and where I'm going:
    */

    {
        "name": "Ishmael",
        "money": 0,
        "destination": "the watery part of the world"
    }

>> If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

So if "it could commit copyright infringement, but does not always do so" is good enough for your company's legal review team, then go for it.


Has anyone tried to see how similar is their manually written code to other codes out there? I bet small snippets 1-2 lines long are easy to find. It would be funny to realise that we're more "regurgitative" than Copilot by mere happenstance.

Will the court believe that Copilot created an exact copy of Tim Davis's code "by mere happenstance"?

https://twitter.com/DocSparse/status/1581461734665367554


It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

>> It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

That's cool.

But emitting copyrighted code without attribution and in violation of the code's license is still copyright infringement.

If I created a robot assistant that cleans your house, does the shopping, and occasionally stole things from the store, it would still be breaking the law.


While I do enjoy everybody acting as armchair lawyers.... until we get an actual legal ruling, the general consensus seems to be that it is sufficiently transformative as to be considered fair use.

> occasionally stole things from the store

It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft - copying open online content and sharing? theft, learning from data and generating - also theft. Stealing from a physical store - you guessed it.


>> It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft

Theft has a definite legal meaning. So does copyright infringement.

The court can decide if it is copyright infringement or fair use:

https://githubcopilotlitigation.com/pdf/1-0-github_complaint...


I always get the impression that CoPilot critics have never actually used it to get any work done and are basing their criticism solely on a tweet they saw about the Quake square root copy pasta function

The article itself lists three other recent examples, two of which are clearly copyright infringement https://twitter.com/DocSparse/status/1581461734665367554

It is not a theoretical concern


Oof. LGPL. That "time saver" will infect your entire codebase and open your company to sizable liability.

Even if they're never sued, companies will do internal OSS scans to limit their risks which would catch this. The result would be (at minimum) a talking to for the dev who committed it, and developer time spent doing a clean room re-write.


>will infect your entire codebase

No, it won't. It will only infect the resulting binary.


And then I will download a trial of the resulting binary, and send a GPL compliance letter to the company. Unless they took care to use dynamic linking in the LGPL case, they are legally obliged to send me the source code under the license, so I can release it all as FOSS under that license.

Yes, they are legally obliged, but they can just tell you to pound sand and you can't do anything because you aren't the copyright holder.

Then I contact the copyright holder (or one of the many, in case of e.g. the Linux kernel). They probably care, or else they would not use the GPL. I also believe there are organizations that can help such as the Software Freedom Conservancy.

Ok, 3 tweets where someone has coaxed copilot into reproducing some copypasta for clout.

That's just not likely in any real use of copilot since it is typically completing single lines, using the variables and patterns which occur in the file that is being edited.

Anybody who had actually used it for work would know that these contrived examples are irrelevant.


Physically coding is not at all where I spend the majority of my time at work or on personal projects. I exclusively use Haskell though, so maybe that has more to do with it.

But why optimize a non-critical path?


Indeed. I had to write a graph traversal iterator in Rust and Copilot wrote the entire thing for me. I could have written it myself, it would have looked similar, but it just... did it. It was trivially to test and verify correctness.

That's minutes of work, maybe even 10 minutes, turned into seconds. That is huge.

The risk here is extremely low. Who is going to sue consumers of Copilot? It makes no sense. They'll sue Microsoft and, in a decade, we'll see if the win or lose (IMO Microsoft will win, but it's not important).


Did it "write it for you"? Or did it "illegally copy it for you"? That's a very big difference.

I'm not claiming that you can't get big productivity boosts by ripping off code like a crazy person. I bet you can! But should you?


Yes, software copyright and patents are a mistake.

>> Yes, software copyright and patents are a mistake.

Richard Stallman would agree, but there are many of us who make a living writing software.

Is software valuable enough that people will pay money for it?

If you write original software that solves a problem, shouldn't you be able to license it how you want and profit from it?

You are welcome to license the software you create how you want. Let me license the software I create how I want.

If I dual license my software as GPL and commercial and GitHub Copilot reproduces my GPLed code without attribution and without the license, how it that not copyright violation?


Do you find meaningful distinction between an individual reading your code and copying patterns vs an AI model doing the same?

No, provided that both give proper attribution and follow the license the code is released under.

That's a hilarious expectation. How often do you give attribution to inventors of patterns you use in your software?

>> How often do you give attribution to inventors of patterns you use in your software?

If GitHub Copilot was only "copying patterns" then it would be a lot harder to call it copyright violation and misappropriation of existing code.

And yet that is exactly what GitHub Copilot has been accused of doing: recreating copyrighted works without attribution and in violation of the licenses that the code was released under:

https://githubcopilotlitigation.com/pdf/1-0-github_complaint...

https://twitter.com/DocSparse/status/1581461734665367554

>> That's a hilarious expectation.

Only if you think lawsuits are funny. Cease and desist orders and damages show that they are no laughing matter.


Stallman is pretty hard FOR copyright. The strong guarantees of "free software" is 100% based on strong copyright law.

That's not what i get from reading him https://www.globalnerdy.com/2007/07/06/richard-m-stallman-co...

My favorite part:

>Copyright Now

>Now with digital data and computer networks, it much easier for us to copy and manipulate information

> Digital technology has changed the effect of copyright law

> Copyright used to be a power that was:

> wielded by authors

> over publishers

> to yield benefits to the public

> Now it’s a power that is:

> wielded by publishers

> to punish the public

> in the name of the authors

> Now the public wants to copy and share — what would a democratic government do?


None

I don't really care, it's a trivial algorithm that I would have written virtually identically.

Nothing has been decided in a court of law so saying that it's "illegal" is disingenuous.

Even if it's remarkably similar to another function from a completely different code base but some of the symbols or variable names or function name has been changed, I would argue that it still falls under fair use, and is sufficiently transformative.


Microsoft's own FAQ suggests it's on users of Copilot to avoid infringing, and that without a clue of where and how the suggestions came to be.

> Nothing has been decided in a court of law so saying that it's "illegal" is disingenuous.

There are plenty of examples by now of big chunks of code lifted verbatim but without attribution. Pretty clear cut stuff.

> Even if it's remarkably similar to another function from a completely different code base but some of the symbols or variable names or function name has been changed, I would argue that it still falls under fair use, and is sufficiently transformative.

That's BS on many levels. Changing variable names doesn't make copyright go away. It's just trying to hide your violation of it.

I am pretty vocally against copyright, but let's not kid ourselves about the morality of this. No attribution is immoral.


I was a naysayer but find copilot makes me more productive. Especially at writing tests. It's very good at recognizing patterns in your own work, and completing an entire test based on the function name.

I tried to do this and I couldn't figure it out. I never got the sense that it knew anything about the code I had written, just that it was dreaming stuff up from its training set.

> Same thing with copilot, of course its going to write buggy/insecure code, but instead of going to stackoverflow for a snippet its suggested in my IDE and with my current context.

copilot actually can have the benefit here of being able to retroactively mark some snippet as insecure, if it gets flagged as such by the moderators. Any user who used it could get automatic notification.


It isn't worth price, I was in the beta and thought it was good. But I'm hoping a better alternative that's cheaper comes about.

Do you make less than minimum wage? Because even at minimum wage it saves me enough time a month to pay for itself. In my opinion it has a positive ROI after a single day.

None

Why?

1) Starting off, I support AI/ML-based code generation/completion. I would be very happy for the day when I can figuratively wave my hand and get 80-90% of what I need.

2) It might be fair to allow authors to submit repos, along with some sort of 'proof of ownership' to Copilot, in order to exclude them from the training set. There might have to be an documented (agreed-upon?) schedule for 'retraining', in order for the exclusion list to take effect in a timely manner.

3) Or just allow authors to add a robots.txt to their repos, which specifies rules for training.

Just a few thoughts...


Pushing the responsibility onto copyright owners rather than GitHub / Microsoft / Copilot seems unreasonable. I'm all for AI being used like this but it also needs to come with some checks and balances to ensure it's not just regurgitation copyright code.

OK, then just use existing copyright licensing:

If a permissive, biz-friendly license (Apache 2.0, maybe others) is found in a given Repo, then it can be used in training set

Otherwise, the repo cannot be used in a training set


And then every snippet ever created with that trained data would have to include an acknowledgement for every repository included in the training set.

The LICENSE file would be longer than the rest of the code.

(FWIW, I agree with you theoretically, but practically it's hard to get your head around what the ramifications of that would mean)


Many permissive licenses (including Apache 2.0) require attribution.

If Joe Bag’O’Donuts copies and pastes LGPL code into his own personal repository that has MIT license attached, is it safe for Copilot to train on it?

I’m really of the opinion that MS needs to document the training set and include a high bar for inclusion of additional repos.


Re 2: So a DMCA notice?

There is some legal risk, but what percent of code you write is potentialy affected by audits before you sell it? So you're trading as a single developer real productivity gain and as a company lower costs for a potential "liability" when you're selling your company. Looks like a good bet. A lot of code will be thrown out or never be sold to anyone.

There is a risk, but the legal risk to individual users is yet to be decided.

What I think is more concerning is that copilot is an extension of effectively automatic copying stuff from stack overflow with even lesser understanding of what the code does by the prompt writer.

Do not get me wrong. I absolutely see the benefits, but the risk listed in the article seems less material than a further general decline in code quality. "Built by a human" may need to end up being a thing same way "organic" became a part of daily vocabulary.


The problem is, all those people supporting Copilot in this thread can actually write said code without Copilot's help. Namely, they know what they're doing and the tool just saves them some typing.

What happens when this extends to the "specialists" that blindly copy code off Stack Overflow? What happens when this becomes part of learning to program? Will it be as useful for producing working, efficient code when used by people who don't know what they're doing?


I have not used it, but I don't understand how copilot could be useful. As a game programmer I don't spend much time actually writing final code. Most of my time is spent working stuff out on paper or writing little tests which I will discard.

In general I want to write as little code as possible as more code = more problems. The code I do write I want to put great care and craft into in order to keep it maintainable. Giving up any of my agency in this critical area seems like a terrible idea to me.

Something that will help me write more code, or write code faster is of no benefit to me.


I think you need to try it if you want to understand how it can be useful. I also tend to write as little code as possible. Since I started using Copilot, I don't write more code nor less code. I write the exact same code I would have written without Copilot, I'm just 25% more productive with it.

Are you a webdev? Cus I have been purely a game dev my whole career. I never wrote a single web-app until very recently when I learned some web frameworks to make simple backends for hobby projects of mine in my spare time.. I was kinda shocked how much boilerplate there is and how proscriptive the web frameworks are (I have done some node.js and asp.net) Also for non typed or compiled langauges like javascript the IDE support and autocomplete seems almost non-existent compared to what I am used to. I would imagine something like autopilot would be more useful in that context.

I'm in the same boat exactly. Some people are saying they're more productive with it but all I can ask is 'howwww!?'

What's odd is that I'm noticing almost every single report of it being useful is from someone who is anti code licenses. Or rather, not that they're philosophically opposed, but they disregard licenses altogether because it benefits them and they can't be stopped. I've seen so few reports of usefulness coupled with legal or moral skepticism.


Context: Kolide just launched a "GitHub Copilot Check" which you can get (along with other features) for $7/device/month. The article is marketing -- an attempt to induce demand among CTOs for an already developed product.

That said: I generally agree with the assessment. Github should at the very least be telling users when it is generating code that they trained on. Until it does that, it's kind of dangerous to use. The security stuff is imo more of a red herring.

But the more important point is that you can just wait a year and hire a consultant to build a better product (for you) at pretty low cost. Within a year, any organization with a non-trivial number of developers will have the option of hosting their own model trained on The Stack (all permissively licensed) and fine-tuning it on their internal code or their chosen stack. That's probably the best path forward for most organizations. If you can afford $7/dev/month for Slack-integrated nannybots you can definitely afford to pay a consultant/contractor to setup a custom model and get the best of both worlds -- not giving MSFT your company's IP while also improving your dev's productivity and happiness beyond what a generic product could deliver.


I usually complain about "thought pieces" that push a product at the end.

But now I realize I like that a lot more than being aware that the article I'm reading is going to push me to take an action (start a discussion with my team) and a probable outcome is "enforce no Co Pilot on company machines".

Sneaky! Good catch. Article should have a disclaimer at the bottom


I'm really getting tired of lawyers, and collectively our "inner-lawyer", poo-pooing this merely for licensing and GPL issues, neither of which have any practical implication on anything a software engineer does.

All this "controversy" around Copilot just reeks of a kind of technological "social justice" that most people didn't sign up for but seem happy to sit, watch, and commiserate on.


I very much assert that the legal, economic, and social context in which a programmer operates has a great impact on what the programmer produces. We established the licenses we have for good reasons. Licenses alter all of the above variables. We are not simply code production machines. We make code for reasons.

You are free to view yourself as a code production machine, where what you produce is independent of the situation before and after you make anything, but many of us would like to take ownership and action on the legal, economic, and social planes with our work.


> poo-pooing this merely for licensing and GPL issues

People want to use it, but are extremely worried about getting in hot water for using it. Thats no idle concern.

It’s very reasonable to ban its use within a company given the legal limbo.


>given the legal limbo

Exactly.


How to make it to the front page in any tech forum:

Step 1: "GitHub Copilot Bad.... amirite!>"

Snark aside, most of these articles miss the mark to the point where they seem like the author is tech illiterate and is just parroting soundbites from others' opinions.


Can you point to any specific examples that make you doubt the tech literacy of the author or similar articles? Perhaps some discrepancies among their points or unsupported conclusions?

I saw there was some (unofficial) package for Emacs, reusing some vim Copilot integration. Anyone here tried Emacs+Copilot yet? Is it working fine? Out of curiosity I'd like to try it and, who knows...

Also: does Copilot work for Clojure and is it any useful for Clojure?


Please don't use copilot, decide it's not worth the risk for your company. In the great competition that is the labor market, copilot is giving me a leg up on everyone who isn't using it. It's the biggest single tool based improvement to my productivity since JetBrains.

I was sort of thinking the same thing - it's been such a positive impact on my productivity and time. If other people don't want to use it - don't, but you're not going to stop me and it's only going to get better as more competition arises and we finally have decent on-device options.

> ... but you're not going to stop me

Take care making public statements like this if your work is highly attributable / traceable.


What do you think of folks who secretly outsource their programming work? It certainly gives them a leg up in the market.

I don't think it does. The people who are doing that are probably not that good to begin with. The people they outsource to are not that good. And then you have to communicate requirements to them, and manage them. I wouldn't be more productive doing that unless I had a whole team behind me, and then I'd be a manager, not a programmer anymore.

I think I'm deliberately ignoring the tortured argument you're trying to make that copliot is similarly unethical - which is just ridiculous. It deserves to be ignored.


There is a major difference between the help you can get from an IDE or editor with a language server running in the background, and then GitHub Copilot stealing away other peoples code.

I sincerely hope Microsoft looses this law suit.


I'd rather have some "middle-ground" solution instead of losing such a tool

I don't see anything wrong with "stealing" code that was meant to be public

Banning it brings no value in compare to those tools.

Also, how is that different from Google's scraping whole internet?


What if the "meant to be public" was decided because there could be strings attached, even if only to require attribution?

> Also, how is that different from Google's scraping whole internet?

Many reasons: Google Search provides sources, it links to your website. Copilot only gives the content. Google also doesn't suggest including its search results in your product verbatim.


Thus, if Copilot showed original link to the code, then it'd be fine?

Sure; companies would expect their employees to check the license. Remember that a lot of them consider GPL (especially AGPL) software "radioactive", so it will still effectively dissuade them.

However, it is likely that many engineers will skip checking their sources. I guess for this to work Copilot should include the attribution automatically in a comment.

The real problem is that doing this is not possible for Copilot, because the tool itself does not know the source.


They won't. They have both the financial resources and honestly probably a legitimate claim that the model produces a sufficiently transformative result that would be considered fair use.

Even if the ruling did go against them, it would likely still be acceptable to train the models since it would be the usage that would be legally suspect.

Which means that in five years anyone will be capable of running these models entirely off-line on their personal machines and no one will ever know the difference.


This article makes a big mistake. It assumes copyright infringement is extremely bad and would never be worth doing. In practice when have people been sued over misusing open source software? You most likely won't be caught. And even if you are you can rewrite the code / give attribution then. Even if you do end up having to pay damages, the productivity increase for your company using copilot may be worth the damages.

After Google vs Oracle damages could easily have two or three commas.

I don't understand what you mean. In Google vs Oracle the copyright infringement for the copied code was settled for $0. Open source projects are not as sue happy as Oracle is.

Last I heard it was tens of millions. I'll have to look again.

> Open source projects are not as sue happy as Oracle is.

Comes across as punching down. Like it's OK to steal from the mom and pop stores, they're too poor and overworked to do anything about it.


>Like it's OK to steal from the mom and pop stores, they're too poor and overworked to do anything about it.

An action that is worth the risk is not neccesarily morally or ethically correct.

The article only cared about money and security. Feeling good about your actions wasn't a concern.


I have been writing my PhD thesis in VSCode with copilot enabled, and it it absurdly good at suggestions in Latex, from generating tables to writing whole paragraphs of text in the discussion.

I wonder if that could trigger any plagiarism checkers. Not that I have any idea what I'm talking about as far as standard operating procedures in academia.

I'm going to keep using it. You won't stop me. You won't catch me. And I just need to read the next five tokens to know whether it's right.

Reading about FOSS copyright is so exhausting. I find no meaningful distinction between reading code and learning from it, vs feeding it into a model. I’ve heard the “it spits out foss code verbatim” argument, and I really don’t buy that. I’ve never seen it. AI assisted software tooling is so powerful we really should consider the social benefits ahead of what is part of our existing legal framework.

> X is so powerful we really should consider the social benefits ahead of what is part of our existing legal framework.

Laws can be slow to catch up, that's a feature. The legislature and courts exist for a reason. You can argue they're moving so slow it's better to ignore them. But that introduces a very real liability, at least until you can convince a court or elect representatives.


Like many folks here, I can write and read a variety of different programming languages. Some I've been using for a long time and know very well and some I seldom use but retain the basics.

I don't use Copilot when writing languages I am very comfortable with because I'd rather write code that I completely understand. Or at least, understand to the best of my ability. I find it easier to consider edge cases and side effects when writing original code. Or at least, compared to reading someone else's that was ripped from a project you don't even know the goals of. I don't buy that Copilot improves productivity for this reason as well.

I also avoid using Copilot when writing in languages I am unfamiliar with because I feel like it's robbing me of a learning experience. Or robbing me of repetition that improves my memory of how to do various things in the language.

I don't know. Copilot is certainly impressive but there are too many questions - what I've mentioned and the legal ones in the OP. But perhaps that is a good thing? It is a new angle on copyright that we're going to have to answer one way or another. In programming and other fields.


Just to give you my two cents, for me it improves productivity massively because, while suggestions for actual code are very hit and miss, it is able to get me 90% of the way there writing tests with just the name of the test and context as prompts.

This might not be that big a percentage of my actual work, but in terms of motivation it enables me to work using TDD without the friction of writing boilerplate, which in turn makes programming much more fun.

Also, and this is a big one, you can directly ask questions and it replies. I find that fascinating.

This morning I asked him “why didn’t you test for [specific thing]?” And it replied “because I don’t know how to properly mock [name of a library I was using]”.

Yesterday, while bored in a meeting, I asked copilot whether a coworker’s proposal for an OKR was good and it replied “it’s ok, but keep in mind that it’s a lagging metric”. It’s scary.


> Also, and this is a big one, you can directly ask questions and it replies.

Now that I didn't know!


I haven't used it yet. I believe when people say that it's the future of development and that every dev will have to use it or be left behind, but I can't fathom how people are comfortable sending every iteration of their code to a big tech corporation. I can't wait to see the day where we can run such solutions in our personal computers (or personal cloud servers), but I feel that, in 2022, this type of tool is not yet worth the risk. I hope this is just a temporary obstacle in our way to our future AI-assisted programming.

> but I can't fathom how people are comfortable sending every iteration of their code to a big tech corporation.

I assume you only ever use self hosted source control then? And then where is it hosted?


For private business code, yes? Of course.

It's very easy to host your own GitLab server if you need a fancy web interface, and even easier to just put Git anywhere if you don't.


Version control is probably the easiest thing to self host for both individuals and enterprises, but that's still very different from Copilot. A Git service provider will have access only to what I choose to push, but Copilot gets access to what I'm TYPING. It's scary, at least for me.

If copilot is fine, then software licenses are meaningless imo

A couple days ago I wrote a new class. Went to write a unit test, it wrote several hundred lines of functioning unit test for me. It's worth it.

Programmer: Uploads code to Github for the public to see / use Github: Uses code uploaded by programmers to learn and make other code better Programmer: NO FAIR! My code can only be used the way I want it to be and my code is absolutely unique and no one else has coded something like it

I think it kind of flows into two trains of thought in the against category. First off, that some people are worried about copywrited, private stuff being included in the training data. I've not read up on copilot recently, so not sure if this is a reasonable thing to be worried about or not.

The other, is that people might be using Github to share stuff they've come up with other developers, but having an AI parse that information means that there's a disconnect between giver and receiver. It removes a chunk of the feedback loop being possible, which makes it so rather than it being a community of developers, it becomes something more akin to content creators and lurkers. That's not necessarily a bad thing, due to it opening up the sheer number of possible usages that end up using something. But it would minimize community feedback.


Many FOSS developers use copyleft licenses to ensure that only FOSS benefits from their code.

Copilot suggesting it for inclusion into proprietary codebases is then effectively whitewashing that GPL code.

Copilot also does not provide attribution, which is a legal requirement of tons of permissive licenses, including MIT and BSD.


you still on about copyright? what about the fact that it will just add vulns and bugs to your code? or is the industry so bad at this point that a gimmicky AI tool can do better

I would say the risk is minimal. You need to bait Copilot really hard for it to produce anything coherent from existing code. That's simply not how you use it.

Regardless, the risk need to be really big for me to stop using it. It's such an essential tool for me now that I get shocked how crippled I feel when internet stops working and I realize how much I depend on it.


Then perhaps it's better not to depend on such a tool.

No way. I would rather stop programming altogether than go back to life before Copilot. You've already lost this one.

What about the tool is so intoxicating you'd give up a lucrative career if you lost it?

"You might get sued if you use this software you paid for" is already covered via an indemnification clause in any reasonable enterprise software license agreement. I'm sure Microsoft/GitHub will be no different in indemnifying their customers who purchase Copilot.

> You are responsible for the code you write with GitHub Copilot’s help. We recommend that you carefully test, review, and vet the code before pushing it to production, as you would with any code you write that incorporates material you did not independently originate.

Looks like Microsoft says burden is on CoPilot users to 'vet' the code.


That's them saying you can't sue them if the code Copilot suggested doesn't work.

Regarding someone else suing you because you used Copilot, their terms say:

> GitHub will defend Customer against any claim brought by an unaffiliated third party to the extent it alleges Customer’s authorized use of the Service infringes a copyright, patent, or trademark or misappropriates a trade secret of an unaffiliated third party.


For CoPilot, the "use" would be asking it to give you completions, not shipping the product that contains those completions.

Also, the ToS for "additional products", which specifically covers CoPilot, has this to say:

"The code, functions, and other output returned to you by GitHub Copilot are called “Suggestions.” GitHub does not claim any rights in Suggestions, and you retain ownership of and responsibility for Your Code, including Suggestions you include in Your Code."


I seem to recall a recent copyright office decision [0] where it was decided that an AI does not own copyright of its own work, because the output is not the effort of an entity which used intellectual effort to create the output. only a human can own copyright, according to that (US-only) copyright office decision.

this means the output of an AI isn't even considered "work" in the eyes of copyright law, if I am understanding. if the output of copilot is not a "work" then the output of copilot cannot be a "derivative work," and cannot violate copyright.

Courts have repeatedly found that only stuff humans create can be copyrighted.

If GitHub Copilot can't produce a work, it can't violate copyright; only human operators can do that.

This makes the recent GitHub announcement about coming Copilot features make much more sense legally: features which show the origin of the suggestion and which allows a user to select which code licenses to use suggestions from. previously these seemed like things to appease critics, but they're tools to help paying copilot users know what code they're actually using. Nice.

IANAL.

anyway, this lawsuit is gonna fail so effin' hard. lol

[0]: https://www.theverge.com/2022/2/21/22944335/us-copyright-off...


No one is going to take the AI to court. Just like no one is going to sue the Ctrl+V keys on a keyboard. They're going to sue users of Copilot and or human producers of it.

the humans at GitHub have created software which by definition cannot create copyrighted works. if the AI cannot create copyrighted works, it cannot violate copyright.

Humans can create copyrighted works, and therefore can violate copyright.

Given these things, I fail to see how anyone at GitHub could get into any trouble, legally.

Thus, the suit against GitHub/Microsoft will fail, which was the point that I apparently failed to communicate clearly enough.

If anyone is liability-adjacent here, it is copilot users, who must always be mindful of what they write anyway.


Just calling something AI doesn't copyright-wash the output.

If I write an RE to strip license comments and sell it as AI that doesn't mean I'm innocent of facilitating copyright infringement.

Or if I sell a NN to strip watermarks from a movie broadcast it's unlikely a court is going to throw out the case because "welp it's AI, no copyright".


go look up the findings from courts on this and come tell me i’m wrong. computers cannot create copyrighted work alone; the human operating and/or programming the computer is who copyright is granted to.

in the case of Copilot, the output is determined by the keystrokes of the user who uses copilot; that user must decide if what they see is to be used, or if some other piece of code should be used instead, or if they need to provide more input or even just write the thing themselves. Copilot aids the user; it does not run automatically or non-interactively, meaning it alone cannot violate copyright.

no computer or computer program has EVER been granted copyright, anyway. ever. justification for those decisions is that copyrightable work necessarily requires intellectual effort to create, which is something only a human can supply, by definition.


The question is not whether CoPilot output violates copyright (or rather, that is a separate question that is a concern for users of CoPilot). The question is whether CoPilot itself - the model, that is - constitutes a derived work.

that will not pass muster to even be argued in court. i’m so tired of arguing with people about this.

go read some court judgements on PACER for a while, you’ll see.


Legal | privacy