Hacker Read

nabakin · 2022-10-17 00:12:09

I'm thinking Copilot may be good at naming variables and still use randomized variable naming in their training set

speedgoose | karma 8135 | avg karma 2.13 · | 2022-10-16 23:11:59

Copilot is good at naming variables, I don’t think you should randomise them.

benreesman | karma 7735 | avg karma 3.6 · | 2022-07-23 19:15:50

Copilot is weird.

The N in the NLP training set and arch implies a fuzzy match.

Why anyone would rather start with a plausible but broken-in-a-subtle way buffer rather than a blank one is beyond me.

reply

leereeves | karma 8853 | avg karma 1.97 · | 2022-10-12 00:09:11

I think it's interesting. Not because it exposes names, but because it shows (again) that Copilot is reproducing training data verbatim.

And when the time comes to explain Copilot to laypeople (say, in court), this is an example they can understand.

reply

karmasimida | karma 1537 | avg karma 1.87 · | 2022-02-02 18:19:55

Copilot is cool and all.

I didn't find reading largely correct but still often wrong code is a good experience for me, or it adds up any efficiency.

It does do a very good job in intelligently synthesize boilerplate for you, but be Copilot or this AlphaCode, they still don't understand the coding fundamentals, in the sense causatively, what would one instruction impact the space of states.

Still, those are exciting technology, but again, there is a big if whether such machine learning model would happen at all.

reply

lilyball | karma 19000 | avg karma 20.99 · | 2021-07-07 00:42:45+00:00

I don't think it's possible for copilot to improve on this problem. It doesn't actually understand the code, it's just statistical models all the way down. There's no way for copilot to judge how good code is, only how frequently it's seen similar code. And frequency is not the same thing as quality.

visarga | karma 12425 | avg karma 1.65 · | 2021-07-03 21:38:24

This is not such a big problem in reality because the output of Copilot can be filtered to exclude snippets too similar to the training data, or any corpus of code you want to avoid. It's much easier to guarantee clean code than train the model in the first place.

stew-j | karma 97 | avg karma 1.31 · | 2022-08-20 15:55:19

I might have missed it today in your articles or comments here--it's been a hectic day--but has there been some study of just how different code would be given that the students are using the same text from questions? Is there randomization intrinsic to Copilot, or is it just because minor variations in textual input causes code to be so different?

My wife taught CS, she did catch cheaters pre-Copilot, and my first thought it that she probably would enter test questions and print out a reference sheet for Copilot generated results.

reply

chowells | karma 2997 | avg karma 4.11 · | 2022-01-13 16:00:17

The thing is, the training data is "everything on GitHub". That contains a quite large amount of student assignments that are poorly and incompletely done.

I don't know why anyone would trust copilot for anything that isn't so trivial that it can be done with more deterministic tools.

reply

bugfix-66 | karma 501 | avg karma 2.15 · | 2022-11-16 20:40:31

I think we should question this, and give it a real test.

Let's see how Copilot does on code that wasn't in the training set.

No problem, right?

reply

hk__2 | karma 6067 | avg karma 3.46 · | 2023-04-07 07:44:04

Copilot is trained on publicly available code.

WithinReason | karma 9065 | avg karma 5.52 · | 2023-02-02 16:26:09

It seemed to me all the examples in the article (and in copilot examples as well) are repeated training points, so I think it would solve the problem.

jorvi | karma 2253 | avg karma 1.58 · | 2018-04-12 14:38:12

IMO CoPilot would have been a great name that much more accurately covers what it does.

dtagames | karma 4368 | avg karma 4.32 · | 2022-12-10 22:23:41

Yes, I use Copilot, too for that very reason. But we need to be very careful about words like "semantic" and "understanding" as the method is neither.

We'll get the best use out of these technologies if we don't ascribe to them magical qualities they don't have.

Poke around. You'll find out it just statistical math with tokens (letters and punctuation). No meaning is ascribed to anything.

reply

dustingetz | karma 7361 | avg karma 2.44 · | 2022-09-20 05:37:29

copilot might be good at this

danielvaughn | karma 7019 | avg karma 5.4 · | 2023-03-22 10:36:53

I haven't used copilot but your experience sounds exactly like what I would expect. Since AI is based on prediction, it makes sense that broader predictions would be less accurate. I think stringing together output from a lot of smaller predictions would yield better results. Which, at the end of the day, means that a human + AI will always be more productive than AI on its own. At least for the foreseeable future.

laichzeit0 | karma 2359 | avg karma 2.36 · | 2023-12-05 23:07:53

I find his comment on Copilot to be enlightening. He says it’s not always right, but he still finds it useful. Kinda like that old saying in stats “all models are wrong, but some models are useful”. Language models are really helpful when you have enough experience to be able to judge when it fails, I do feel sorry for novices who don’t have enough experience yet to do that.

coredog64 | karma 3819 | avg karma 2.1 · | 2016-07-13 04:51:04+00:00

How about"copilot"?

jeremyjh | karma 8085 | avg karma 3.3 · | 2023-02-15 00:35:46

Well I'd hope that is what is going on in Copilot. It definitely does seem to be trained on my code to some extent, but it doesn't have anything I'd call a semantic understanding of it.

pulvinar | karma 484 | avg karma 2.12 · | 2023-04-28 00:15:52

Except that "copilot" doesn't appear to be registered by itself in this field, only in combinations. But we'll see.