Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I'm thinking Copilot may be good at naming variables and still use randomized variable naming in their training set


sort by: page size:

Copilot is good at naming variables, I don’t think you should randomise them.

Copilot is weird.

The N in the NLP training set and arch implies a fuzzy match.

Why anyone would rather start with a plausible but broken-in-a-subtle way buffer rather than a blank one is beyond me.


I think it's interesting. Not because it exposes names, but because it shows (again) that Copilot is reproducing training data verbatim.

And when the time comes to explain Copilot to laypeople (say, in court), this is an example they can understand.


Copilot is cool and all.

I didn't find reading largely correct but still often wrong code is a good experience for me, or it adds up any efficiency.

It does do a very good job in intelligently synthesize boilerplate for you, but be Copilot or this AlphaCode, they still don't understand the coding fundamentals, in the sense causatively, what would one instruction impact the space of states.

Still, those are exciting technology, but again, there is a big if whether such machine learning model would happen at all.


I don't think it's possible for copilot to improve on this problem. It doesn't actually understand the code, it's just statistical models all the way down. There's no way for copilot to judge how good code is, only how frequently it's seen similar code. And frequency is not the same thing as quality.

This is not such a big problem in reality because the output of Copilot can be filtered to exclude snippets too similar to the training data, or any corpus of code you want to avoid. It's much easier to guarantee clean code than train the model in the first place.

I might have missed it today in your articles or comments here--it's been a hectic day--but has there been some study of just how different code would be given that the students are using the same text from questions? Is there randomization intrinsic to Copilot, or is it just because minor variations in textual input causes code to be so different?

My wife taught CS, she did catch cheaters pre-Copilot, and my first thought it that she probably would enter test questions and print out a reference sheet for Copilot generated results.


The thing is, the training data is "everything on GitHub". That contains a quite large amount of student assignments that are poorly and incompletely done.

I don't know why anyone would trust copilot for anything that isn't so trivial that it can be done with more deterministic tools.


I think we should question this, and give it a real test.

Let's see how Copilot does on code that wasn't in the training set.

No problem, right?


Copilot is trained on publicly available code.

It seemed to me all the examples in the article (and in copilot examples as well) are repeated training points, so I think it would solve the problem.

IMO CoPilot would have been a great name that much more accurately covers what it does.

Yes, I use Copilot, too for that very reason. But we need to be very careful about words like "semantic" and "understanding" as the method is neither.

We'll get the best use out of these technologies if we don't ascribe to them magical qualities they don't have.

Poke around. You'll find out it just statistical math with tokens (letters and punctuation). No meaning is ascribed to anything.


copilot might be good at this

I haven't used copilot but your experience sounds exactly like what I would expect. Since AI is based on prediction, it makes sense that broader predictions would be less accurate. I think stringing together output from a lot of smaller predictions would yield better results. Which, at the end of the day, means that a human + AI will always be more productive than AI on its own. At least for the foreseeable future.

I find his comment on Copilot to be enlightening. He says it’s not always right, but he still finds it useful. Kinda like that old saying in stats “all models are wrong, but some models are useful”. Language models are really helpful when you have enough experience to be able to judge when it fails, I do feel sorry for novices who don’t have enough experience yet to do that.

How about"copilot"?

Well I'd hope that is what is going on in Copilot. It definitely does seem to be trained on my code to some extent, but it doesn't have anything I'd call a semantic understanding of it.

Except that "copilot" doesn't appear to be registered by itself in this field, only in combinations. But we'll see.
next

Legal | privacy