Hacker Read

xani_ · 2022-10-16 21:16:14

> The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

The reason doesn't really matter...

reply

theteapot | karma 708 | avg karma 1.96 · | 2022-10-16 21:38:03

> The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

Sounds like MS has devised a massive automated code laundering racket.

reply

YeGoblynQueenne | karma 22041 | avg karma 2.5 · | 2022-03-20 07:35:48

>> Easy to check, just put in the first few words and if copilot autocomplete is identical to the remaining solution you just caught a cheater.

That doesn't work. Copilot is not deterministic. It generates different completions for the same prompt, at random. I don't think you can catch them all, either.

I don't think it's so simple as banning identifiers either. Copilot is capable of adjusting its code to identifiers in the prompt and reusing them correctly. It's quite impressive at that, really.

So no, I don't think it's that easy peasy.

reply

Kiro | karma 10888 | avg karma 1.51 · | 2023-01-06 09:12:04

> where you often do with copilot

Not at all. You need to bait it really hard and push it into a corner for it to reproduce anything. At that point you might just as well go to the repo and copypaste the code directly.

I've used Copilot since day one and still haven't seen anything that felt like a 1 to 1 copy of something. It's highly contextual and all about using the code I have already written to craft its suggestions. Even if I ask it explicitly for a known algorithm it will use my code style, patterns and naming conventions to write it.

reply

earwin | karma 15 | avg karma 3.75 · | 2024-03-13 23:04:14

> 1) copilot is a terrific auto complete, and writes tremendous amounts of repetitive boilerplate

I term this "low-entropy code". Copilot is great at writing heaps and heaps of low-entropy code.

The thing is, if you're not paid by LOC, and care about your system as a whole, you normally strive to get rid of code if possible (any code is liability), and make the rest of it high-entropy.

Today's terrific autocomplete is tomorrow's legacy shit you have to deal with.

reply

waylandsmithers | karma 827 | avg karma 2.27 · | 2023-06-09 14:18:21

> Even worse, occasionally I will accept a suggestion and only later notice a subtle “mistake”.

Same, I got burned one time by accepting a suggestion I thought I understood.

I think the general problem is that Copilot shifts you from writing code to reading code, and sometimes reading is harder. You can't really take a "yeah that seems right" attitude because it's just throwing guesses at you and seeing what sticks. The safe way to use it is as a jumping off point for writing your own code.

reply

the_gipsy | karma 3776 | avg karma 2.11 · | 2022-08-29 13:03:06

> I don't use CoPilot. If that's the #1 complaint the rest is peanuts.

CoPilot launders your code, it doesn't matter whether you use it or not.

reply

Zababa | karma 5670 | avg karma 1.79 · | 2021-07-09 14:43:42+00:00

> Github's study predates any negative press, so that would not have be the reason for them to manipulate the study.

I seriously doubt that no one raised the concerns that are raised today during the development of Copilot.

> And the examples that are making the rounds combine two aspects a) prompt-engineering b) famous code samples. That's hardly representative of normal use.

That's true, however the way they advertise Copilot is to prompt it with comments, which might push it to regurgitate code more often.

reply

lolinder | karma 32428 | avg karma 5.57 · | 2022-11-17 17:52:17

> Copilot is just copy / paste of the code it was trained on.

Every time I hear someone say this, I hear "I've never really tried Copilot, but I have an opinion because I saw something on Twitter."

Given the function name for a test and 1-2 examples of tests you've written, Copilot will write the complete test for you, including building complex data structures for the expected value. It correctly uses complex internal APIs that aren't even hosted on GitHub, much less publicly.

Given nothing but an `@Test` annotation, it will actually generate complete tests that cover cases you haven't yet covered.

There are all kinds of possible attacks on Copilot. If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

EDIT: There's also this fun Copilot use I stumbled across, which I dare you to find in the training data:

    /**
    Given this text:
 
    Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

    Fill in a JSON structure with my name, how much money I had, and where I'm going:
    */

    {
        "name": "Ishmael",
        "money": 0,
        "destination": "the watery part of the world"
    }

tucif | karma 576 | avg karma 3.41 · | 2022-06-21 14:31:20

> co-pilot can write 90% of the code without me, just translating my explanation into python.

I fear copilot may encourage these type of pseudo-code comments. The most valuable thing the AI doesn't know is WHY the code should do what it does.

Months later, we'll get to debug code that "nobody" wrote and find no hints of why it should behave that way, only comments stating what the code also says.

Seems we're replacing programming for reverse engineering generated code.

reply

moffkalast | karma 7759 | avg karma 1.88 · | 2022-06-23 06:59:30

> If anything, use of Copilot would be an improvement

What do you mean, Copilot regularly pastes stuff directly from SO. One of those automatic doc generators was able to point me to the exact answer where one of them was from.

reply

dools | karma 6250 | avg karma 2.61 · | 2022-06-23 09:21:44

> What Copilot actually does is read every line of code from a project, and keep a dataset built around what code tends to have been written adjacent to what other code.

No it reads every line of code from EVERY project.

It is not reproducing code from individual coders, it is reproducing patterns deduced from the work of millions of coders.

It's like Smart Compose in Gmail. Copyright on anything I write is mine by default, but Gmail can of course take all things written by everybody and then use it to train a thing that detects what I probably mean and offer sentence auto completion in Gmails.

reply

alex_sf | karma 1182 | avg karma 3.27 · | 2022-06-23 09:02:26

> Copilot isn't automagically going to give you the windows XP source code tho

I mean, it might. That's the whole concern. It's scraping random bits of code from all over the place.

reply

aerovistae | karma 8386 | avg karma 5.23 · | 2021-06-29 15:27:55+00:00

> Potentially for newer developers it robs them of active experience of writing code.

And for those with experience, this will be obvious when reviewing their code. There's only two possibilities -- either copilot will get so good that it won't matter, or code written by copilot will have obvious tells and when someone is over-relying on it to cover up for a lack of knowledge, that will be very clear from repetition of the same sorts of mistakes.

reply

Aeolun | karma 23207 | avg karma 2.16 · | 2022-10-17 19:48:35

It’s fairly obvious if Copilot is regurgitating entire blocks of code that someone else wrote.

In 999 out of a 1000 cases it’s just spitting out boilerplate though.

reply

knodi123 | karma 6240 | avg karma 3.04 · | 2023-01-25 12:36:16

> nobody is really replacing somebody with a program that confidently get half it's answers wrong

I gotta tell you, I use copilot to help with my coding, and it still sends a shiver down my spine when it writes the entire database migration based off of the filename, or writes 4 pages of automated tests that work right on the first try.

reply

vineyardmike | karma 7130 | avg karma 2.33 · | 2022-10-11 23:49:44

> my guess is that Copilot stored a lot of GitHub accounts, and when we type "@", it autocompletes with any random ones from that list.

That’s not how that works and an mis-feature. No one would want a real username from a random list and there’s no reason to think it has a list of usernames somewhere. It for sure generates usernames in real-time the way you would if I told you to imagine a plausible one.

reply

pessimizer | karma 33733 | avg karma 2.47 · | 2021-07-12 12:08:23

> I think the biggest problem copilot will have in practice gaining traction is that verifying correctness isn’t any faster than writing the code yourself in many cases.

Humorously, this is a similar problem to the one autonomous driving has. Being alert when something goes wrong randomly is more difficult than being alert all of the time.

reply

goodpoint | karma 2939 | avg karma 0.85 · | 2023-05-08 04:02:43

> it seems to me that Copilot truly does synthesise code from some store of knowledge

It's a common mistake because we are not used to LLMs.

reply

thesuperbigfrog | karma 7672 | avg karma 3.88 · | 2022-11-17 17:33:50

>> As programmers we take pride in being DRY. Copilot is helping us not reinvent the same concept 1000 times.

That's what libraries are for.

Copilot is just copy / paste of the code it was trained on.

When the code it was trained on is later discovered to have CVEs, will it automatically patch the pasted code?

With a library, you can update to the patched version. Copilot has no such feature.

reply