Well, how can we teach copilot to avoid raw string interpolation in such cases? Sure, some people will correct it, and the AI might improve, but I suspect that the majority of the developers will never notice it, and that'll just reinforce the AI's bad habits.
As someone who's done a ton of JS/TS development, for browsers and Node, I thought the principle the entire ecosystem was based on was up-to-the-minute crowdsourcing of not only a standard lib, but also 90% of your basic tools and about half of what ought to be language features. Not relying on automated systems to cancel out the mistakes of automated systems.
As someone who spent two weeks trying to get a Typescript project working under Webpack when migrating to Vue 3, by stitching together a web of gratuitous tooling and transpilers that ultimately did not work (I went with Vite and it was all working in 2 hours)...
Also, I just checked out an old Flask/Python project from 7 years ago, updated it to use Poetry dependency management, and it all still works. A JS project that is 7 months old and unmaintained would be a dumpster fire.
When I started with PHP over a decade ago I was using PDO and not MySQLi.
I think there's a lot of old code that perhaps should not be used by Copilot as a reference, given how some programming languages have changed quite a bit over time when it comes to the best way of doing certain things.
Yeah, that was always one of the big problems with PHP. Google search would produce these old tutorials full of SQL injection code. I think there was a community effort to clean these up, so (un)fortunately we have AI to bring them back.
This points to a general problem. Think of all those top-voted but wrong or bad StackOverflow answers, and how many people copy paste them verbatim. Now you’ve got an AI ingesting and reciting those answers to a wide audience, and they will make way into (even more) new code which is then fed back into the training corpus.
I'm not sure that's what feeds the feedback loop. Copilot is essentially a centralized distribution system for code, and efforts can be made to train it using "good" code as well. It's the equivalent of allowing thousands of developers to rewrite the top-voted answer on StackOverflow.
The real issue is employers hiring folks who's only skill is gluing together things in SO. So frequently I see people asking others to do the real work for them, because their task wasn't on stackoverflow (or a basic google).
Those who know what to do aren't hanging around on SO because we know what we're doing and we don't have time to do other peoples' job for them.
Which is why you needed to build an actual data structure to do this kind of work based off checking for against the prototype chain instead of assuming you can use tiny bits of direct JavaScript operators.
The entire selling point of pair programming is that your copilot would point out errors like this, not introduce them.
Pairing works when you either pair two strong programmers or pair a strong programmer with a weak one. In the latter case, the main advantage is that it's also a mentoring opportunity.
Copilot has invented a pair programming situation in which your partner constantly introduces dubious code you must painstakingly verify, but is itself incapable of being mentored.
This is one of the first things that jumped out at me when looking at the Copilot landing page.
There was another one somewhere else on there that calculated the number of days between two dates, but assumed that all days were exactly 24 hours long and is no doubt ready to introduce a maddening bug into someone's codebase when it is used inappropriately.
The sad thing is the author owns the 'fetch' standard which could have, like all its predecessors (superagent, axios etc) encoded URL parameters itself, but chose to go for a (apparently deliberate) low level approach.
Now the author is pointing out that you need to encode parameters manually. Maybe this is a good case for sane defaults like we had in every HTTP client 10 years ago?
I am not the author of the fetch standard. It's pretty easy to look this stuff up. I have a few commits to the spec, but I barely show up in the `git blame`. https://github.com/whatwg/fetch/graphs/contributors
I've written articles about fetch in the past, but if I search Google for "fetch API" my stuff doesn't show up in the first few pages at least, so I don't think I really qualify for "principal advocate".
Dude, could you please add a disclaimer that it's insecure no matter what unless the server-side fetch cleans data. You write very authoritatively, and might give naive devs the impression that your solutions are secure enough.
It doesn't matter whether the client is a browser or another server. The side that receives the request must always secure the data. You should inform your readers.
The amount of people who think that GitHub Copilot is a good idea is just frightening. Just shows how many people never thought about the semantics of their programs for even a second.
Am I against AI supporting humans? Of course not! I think it's the future and holds almost infinite potential. But the way this is done matters, and this is done just so utterly wrong.
How could it be done properly? Well, let's say you have a system in place where you actually prove the correctness of your program. Then of course there is no harm in letting the AI construct the program, because you know it is correct. Or let's say you wrote the program, and now the AI helps you to prove it is correct. Or both.
Of course, when the correctness of your program does not matter in the first place, but just its output, and you happen to be able to judge the quality of the output without knowing anything about your program, then something like Github Copilot makes sense.
The thing is, the technology to do it right is just not there yet. So unleashing this stuff without making sure it can be controlled the right way is basically a Jurassic Park scenario.
The problem is with Copilot, or over-relying on it for its purported convenience, bad devs will have a harder time becoming good devs. Discovering what code to write and what their exact effects are is a major part of the journey to becoming good dev. If humans aren't going thru this process, I very much doubt they'll be able to quality control the code Copilot dumps in until later in the cycle. Doing things the hard way is essential for being a good dev.
I've heard this same line for using IDEs for years. Everyone else is welcome to stick with Ed if they like. I'm going to use the tools that are available to me to make my job easier.
If you don't understand the difference between Idea IntelliJ, which I loved as soon as it came out, and Github Copilot, then you are exactly the reason why I am frightened.
IntelliJ and friends make a lot of suggestions, which aren't always good to blindly accept. It's not exactly the same but it's in the same universe of machine-assisted work that still requires human direction.
There's a pretty big difference between asking a permission and going on unless that same permission is denied.
There's also a large difference in scope between the kind of suggestions a linter gives and what copilot does. Linter usually works around the syntax for equivalent semantics, while copilot tries to infer semantics.
Code templates are the top reason why I avoid IDEs. You just start writing something and the IDE will insert a block of code there, with no indication of where it ends, and catching you off guard if you are focused on something other than data input.
The fact that there are so many people that like them that all the IDEs have the feature on by default with the off switches scattered all over their configuration so you can't turn them off is scary.
When you work for a company that forbids copilot and blocks its traffic from the network, you'll find that your dependence on tools to create your code has held you back.
Ok: and since I would hope we agree that no one worth being a software developer is just blindly copying code off of stack overflow into their code, Copilot is certainly a horrible idea if what it is doing is making that easier or "cooler", right? :/
If we start by assuming the developer is a "good dev", I can't imagine Copilot is going to do anything useful for them, as this isn't going to be a useful way for them to come to understand how the system they are using works, and it won't support them building abstractions to reduce boilerplate. This tool is simply not useful for them.
Which leaves us with the idea what Copilot is a tool designed for "bad devs" to be able to do less work to do the bad things they do, and while I sort of appreciate the idea of "it is worthwhile to support people who don't know what they are doing in their attempt to do the bad thing they are doing anyway", I have severe reservations about doing that with engineering... it at least needs to be marketed as that!
Otherwise, the two main things we should be doing for "bad devs" is either helping them become "good devs"--which this doesn't do for much the same reasons it isn't useful for "good devs"--or we should honestly be trying to convince them not to be a developer at this level at all (which might include building higher-abstraction systems that are easier to use and understand).
Copilot just has the wrong experience. There is nothing wrong with suggesting an algorithm with a code snippet as an example. That is its core use, and it does the job. It just needs the snippets to be clearly marked as such, with a way to track that you reviewed the code and adjusted it to your use case.
When people make security corrections to those snippets, ideally it would learn the corrections. Perhaps even have a feature to flag a snippet as insecure, to help that process.
Copilot is imperfect, yes. But there is a large grey area between perfect and "utterly wrong"
I'm afraid I don't understand your argument. The programmer is still ultimately responsible for the correctness, and quality, of the program. These are just very advanced IDE hints.
If you went to school in CS, think about the worst programmer in your graduating class. You don't think they are going to mindlessly accept the autocomplete as long as it compiles? I can imagine this will lead to a lot of bad code making it into production.
If you don't have these systems in place, you're getting what you deserve. Hiring subpar talent and not having processes in place isn't Copilot's fault.
This is almost like blaming the car for an accident. If you have a shitty driver and the traffic lights aren't working, it's not your Corolla's fault.
If the programmer dosent have the requisite knowledge to verify the code it's hard to know if the generated results are correct. Compare this to copy pasting solutions from stackoverflow. Atleast there you get a pretty good idea of the pros and cons of a solution. With copilot it's all upto the programmers understanding of the generated code. Of that programmer propmts copilot on a domain they don't know much about it could lead to a lot of subtle bugs being introduced
In theory, I guess, but the type of person who just blindly commits code they didn't understand isn't going to read the explanation and isn't going to catch security issues with SO answers either.
The fact that some (bad) programmers already blindly copy SO code does not detract from the original argument that Copilot is dangerous because it effectively copy-pastes SO code blindly.
The fact that it doesn't come with context. I just fail to see the usefulness of the suggestion, if the quality can't be trusted. Either:
a) I'm familiar enough with the code/topic that I'm able to judge whether the suggestion is good, or
b) I'm not able to judge, and need to consult other sources.
In case a, I could have just written the code myself. In case b, I still need to read up, so the potential gain is that Copilot gave me a hint as to what I should search for. But even that hint can't be trusted - there was another comment where Copilot suggested to use MySQLi in PHP, which has been the wrong choice for a long time.
So if the suggestions need scrutinization, the target group I see is people who need to write a lot of code in a familiar area, and are limited by the speed at which they type. This is arguably a very small set of people, and is unlikely to be the main user group of this tool.
When I visit SO, I generally read all the answers and can pretty immediately judge "I think this approach is good" or "I don't think this approach is good" just by reading the code. I can't see why these suggestions would be any different. And in the perhaps more common case where I already know exactly what I want to do it can still save time to have the plug-in autocomplete it (I make heavy use of tab completion today after all).
The mysqli extension in PHP is fine to use and has advantages over PDO if you only ever plan to interface with MySQL. Now, if it was recommending the mysql extension, that would be bad as that's been deprecated and replaced by mysqli for a very long time.
and I still would argue you should never copy from stack overflow!! Instead understand why the answer is correct and then write your code based on that understanding, even if it produces the exact same code in the end.
Also, to get answers on SO you have taken the time to elaborate a question. Building up the right question is a significant part in the process of understanding.
They may not be authors of the question on SO, but to find that question you need to search for it, and it is not uncommon that doing so takes more than one research/thread parsing. In the end, finding the relevant SO answer is not unlike asking a good question in the first place.
Sadly (?) there are many not so good programmers (e.g. because they are new to it).
Finding a bug in something which seems to be correct can be harder then writing the correct code yourself.
Especially if you might not (yet) fully understand what you are doing.
So Copilot is only a good idea if you are a experienced programmer with the discipline to put any auto generate part through a proper (ad-hoc) code review.
At the same time it looks especially appealing to someone just starting to learn coding...
Spotting bugs in code review is already hard when the code you're reviewing has been written by a human. In this case, at least you can rely on your knowledge of the co-worker who wrote it - over time, you learn the kind of experience they have, the kind of mistakes they make.
Spotting bugs in code that has been generated by a GPT-3 derivative, with all the subtle mistakes that implies, is going to be even harder.
In the end we need to tag each piece of code written by copilot. Doing a code review on a piece of code written by a human and on another one generated by copilot is going to be a very different experience. I would be way more wary of a PR containing copilot generated code. Turns out copilot will be a productivity killer.
> Spotting bugs in code that has been generated by a GPT-3 derivative, with all the subtle mistakes that implies, is going to be even harder.
I'm kind of skeptical! I think your claim is reasonable tho so maybe I'm more skeptical of your confidence?
I'd love to read a follow-up from you after you tried using Copilot for an extended period; even (or maybe especially) if it's as bad, or worse, than you expect!
But good code reviews are much harder at scale than building good code in the first place. The reason we write code _and_ do code reviews is because doing both is better than doing either.
But copilot isn't even equivalent to a code review: code review is not only checking for correctness. It's also asking questions and helping the author walk through their solution by having them reconsider it. Copilot doesn't ask questions, nor can it answer them or provide a rationale.
It is going to have the same vigilance problem as partial self-driving cars. When the machine feels like it's handling most everything for you, you let down your guard.
It shouldn't be that way, but it's how people work, so we should expect it to happen. (Psychologically, if an action is very rarely rewarded, people are less likely to do it.) Even if you want to be vigilant, it will be a bit of an uphill battle to make yourself do it.
Also, checking the auto-generated code takes time. You, as an individual contributor programmer, may believe that the time is absolutely worth it, but you will need management to support that. This technology creates additional opportunities for managers to say, "Don't worry about that. Ship it." Which is another thing that really shouldn't happen but in reality does.
Just look into the name of the program. It's "copilot" and in my understanding the intention is just to give suggestion and in not to write provably correct program which matches some given specification. Think of it as 10x of autocomplete, not 1/10x of human replacement.
It's called "Copilot" for a reason: you are still in the driver seat, and the tool is just there to give you better hints and direction.
I would agree with you if it was called "GitHub Self-Coding" where you are the destination clerk and letting the tool code. But that's really not the goal of the tool. Don't put it in hand of non programmers
No, copilot in the real world is not the pilot's assistant, it's also a pilot, do the same work as the pilot, takes the commands as much as the pilot and can or cannot be more experienced than the pilot.
In fact, the copilot is just a normal pilot with the only difference that the pilot is also the captain on board, responsible for the police and security on board. And most of the times, companies choose who is the pilot and who is the copilot randomly on a per-flight basis.
So no, you wouldn't a copilot that gives subtly wrong information to the pilot (and vice versa)
> And most of the times, companies choose who is the pilot and who is the copilot randomly on a per-flight basis.
What companies are you aware of that do this? The proper terms are "Captain" and "First Officer" and they are actual rankings within the company, not something that is randomly chosen on a flight-by-flight basis. The actual details of who does what during the flight are generally not related to the ranks ("pilot flying" and "pilot monitoring" duties can and do switch during flight) although the Captain is always the ultimate authority and will be the one to take control in tough situations because he's got more experience.
Typical (i.e. almost all) commercial flights will have a Captain sitting in the left seat and a First Officer sitting in the right seat.
A real copilot will get fired and/or go to jail and/or die, if it feeds you subtly wrong data, and leads you to crash a plane. GitHub won't suffer any consequences if its Copilot feeds you bad code that causes a security breach, or wastes a lot of engineering time on debugging subtle heisenbugs.
The problem with Copilot is that it works just enough to be dangerous: it streamlines copy-pasting of unchecked code, but the code it offers has subtle bugs. It's even worse than copy-pasting from StackOverflow, because code on SO got at least a passing review by someone who understands it. Here, you get code generated from ML models. Unlike generating pictures of faces or kittens, when an occasional artifact doesn't matter much, an "artifact" in code that you won't notice will still make the code wrong.
> Don't put it in hand of non programmers
Putting it in hands of programmers isn't really any better. To make it work, you need programmers to be disciplined - more disciplined than they were when copy-pasting from SO.
Also the problem isn't code that's obviously wrong when you read it. The problem is when code looks OK, but is subtly wrong. Which keeps happening - as we know today, at least two examples featured on Copilot homepage have this problem. To make this work, you have to read each snippet super carefully - which defeats the whole point.
I really don't think it does defeat the point. The point isn't just churning stuff out as fast as possible without concerning yourself with quality. There are lots of other reasons you'd benefit from snippets -- rote work done for you, familiarity with language ecosystem, and so on.
It's all well and good to say there should be something better than just discipline but there's no idiot-proof way of writing programs.
Ok, maybe it doesn't defeat the whole point. The way I see it, there are two ways of using Copilot:
1. As a way to specify a problem and get a suggested solution (or a selection of them), to use as a starting point.
2. As a way to specify a problem, get it solved by the AI, and consider it done.
I'd expect that any developer worth their salt will do 1. I expect that of myself. I also worry this is so streamlined that people, including myself, will naturally shift to doing 2 over time.
This is similar to the problem with self driving cars - you can't incrementally approach perfection, you have to get it right in one go, because the space between "not working" and "better than human in every way" is where self-driving is more dangerous than not having it. When it works most of the time, it lulls you into a false sense of security, and then when it fails, you aren't prepared and you die. Similarly, Copilot seems to be working well enough to make you think the code is OK, but it turns out the code is often buggy in a subtle way.
> familiarity with language ecosystem
This is an interesting angle to explore the topic. Familiarity is a big factor when inspecting such generative snippets. For example, I'm really familiar with modern C++, and I'm confident I could spot problems in Copilot output (if and when it starts producing C++), maybe 50% of the time. If it's a logic issue, or initialization issue, I'll spot it for sure. If it's a misuse of some tricky bits of the Standard Library? I might not. I make enough of those mistakes on my own. Or, I know enough JS to be dangerous, but I don't consider myself fluent. I'm definitely going to miss most of the subtle bugs there.
To my mind the difference between copilot and semi-autonomous cars is that split-second decisions are not required in this instance. If it takes you a minute to snap out of it and notice the suggestion was wrong, no problem.
On your other point, it's true that if you're working in an unfamiliar ecosystem, spotting subtle issues will be harder. But I think getting automatic suggestions that are probably idiomatic in that language will be more helpful than hurtful.
> When it works most of the time, it lulls you into a false sense of security, and then when it fails, you aren't prepared and you die.
That still doesn't _necessarily_ imply that 'partially self-driving cars' are worse than actually existing humans. Really, anything that's (statistically) better than humans is better, right?
I don't think it's reasonable to think that even 'perfect' self-driving cars would result in literally zero accidents (or even fatalities).
If you already knew how to do the thing, then you wouldn't need co-pilot. The whole purpose of this tool is to suggest a way to do the thing, and the biggest value is when you don't know how to do it and need help. In that case, if you can't trust the help what use is it?
As others have pointed out, at last Stack Overflow comes with context, the ability to rate and comment on suggestions, or even have a discussion about an approach. With this you take it or leave it. Saying you should already know what you need to do and what the tradeoffs are and how to evaluate the quality of the suggestion is basically saying this should have no value to you, and any consequences are all your fault if it does.
I think there's a pretty wide continuum of "knowing how to do it" and in a lot of cases you have a pretty good idea what's going on by seeing the code, once it is presented. I'd further suggest that a lot of examples the give on the page are just finishing tedious series where the problem is mostly just typing it all out.
I only ever picture myself using it to speed up side projects to be honest. It is a glorified tab complete. A quite glorious one don't get me wrong, but that's all it is.
If you're using it to create something you don't know how to do then yeah you're in for a world of disappointment.
HN seems to be of the hivemind that random Joe Nocodes will be firing up VSCode and asking it for a custom version of Uber which.. yeah is laughable and honestly seems pretty obvious that that wont work.
Interesting metaphor. Notice how the Bash tab completion is a great tool, that increases people's productivity by a large amount, and receives only complements from anybody you ask. At the same time, the newish Windows cmd tab completion is a great productivity destroyer that gets mostly only criticism and you will find way too many people blaming it for losing data.
Do you know what is the difference? The bash tab completion is always correct. If it can't be correct, it gives you as much of a correct answer as it can, and leaves the rest up to you. The cmd tab completion doesn't care at all about being correct, it will always give an answer, no matter what you ask for.
Hum... I can see how people would like zsh. It doesn't just guess at the first tab press, and that extension that displays all the alternatives before looping is nice. It's permissive but it's disastrous interaction is opt-in.
If autocomplete on IDEs were that respectful, I'd like them too.
We could call it "Full Self Coding (BETA)", which is an addon package costing $10k, but with a promise that it'll just get more valuable over time. Eventually, you'll be able to rent out your Full Self Coder and get a full coder salary while you sit on the couch at home doing nothing!
Then GH will realise that their product will never be able to do Full Self Coding in it's current form (need to install a supercomputer (LIDAR) at home to do this safely in any language other than rust). This will require that you buy Github Copilot SFC Model Z, It'll be released next year once they've gathered data from your coding habits for awhile. Pinky promise
How is github copilot worse than a very laborious but a very junior developer who can search stack overflow for you and bring you code snippets that sometimes are spot on and sometimes are plain wrong? Presumably that's why code review exists...
We have mandatory code review for every commit that goes into our app, and still stuff like that routinely slips through. You need a lot of experience to spot errors like this, and stuff will still slip through.
- The exact StackOverflow snippet said junior is copying was seen by others and likely already debugged to some extent. There's typically also a lot of relevant information in the text surrounding the snippet.
- The snippet you get from Copilot is generated by an DNN model. I'd think that alone should scare people. DNN models are correct on a continuous scale, not discrete. A bad pixel on a generated image, an extra comma in generated text, are considered OK. That kind of correctness framework doesn't work with code.
From what I read, Codex (the model powering Copilot) is a GPT-3 derivative. Surely you've played with GPT-3 before, you've seen the types of mistakes it makes. In my eyes, this approach just won't fly for code.
> - The exact StackOverflow snippet said junior is copying was seen by others and likely already debugged to some extent. There's typically also a lot of relevant information in the text surrounding the snippet.
Empirically, this is false. There is an incredibly high amount of bad, incomplete, or vulnerable code on SO.
A very junior developer that brings me code snippets that are sometimes plain wrong is not useful to me directly - such a developer costs more time in review than they save by providing the snippets in the first place. The value of such a developer is that over time and with some mentorship they will become a slightly less junior developer who can bring me whole features that I can be reasonably confident are mostly correct.
Having said that, I don't know how the much more integrated write/review/edit cycle will work in practice when using copilot. I don't think it will be the same as a junior developer/pair programming partner in any real sense. My initial reaction to copilot is negative, but I'm open to being proven wrong about it.
This point might be a bit lost in the current state of the industry as HN is exposed to it, but usually the idea behind keeping a very junior developer around is that gradually it'll stop being a very junior developer and become a slightly less junior developer who will spit out code snippets that aren't usually just plain wrong.
I often find that doing good code review is much more exhausting than writing good code. Anything that puts more garbage on my plate for review is a net drain on resources.
It's interesting to me that the next few steps of this technology evolution might be that we loose the ability to prove our code correctness forever. Every program will just be a certain - measured by tests - percentage of accuracy against some goal and that's it.
Then when every program is eventually upgraded to this status, all of the code we run is bootstrapped through proprietary AI - including the AI itself - and it's black boxes all the way down. Programming is now an experimental science where we gather evidence, test hypotheses, press the "copilot" button and tune parameters, like evolutionary biologists studying DNA code.
We have figured it out. People with a PhD and given enough time can do it. Scaling this will be possible, especially with the help of AI, as soon as there is the actual will there to do it.
So, for people to quickly implement http requests, we need a mountain of AI hardware backed by legions of PhDs with endless time to vet the http requests.
That doesn't seem sustainable. It also seems like a poor cost to value ratio.
No. If it is something that has been done plenty of times before, then we would also know how to prove the correctness of it automatically. It would also lead to much better and less code, because you would be aware that it has been done before so many times.
So, for people to quickly implement http requests, we need to feed, clothe, and discipline humans for a minimum of two decades, backed by legions of teachers, doctors, and professors with endless time to service them.
That doesn't seem sustainable. It also seems like a poor cost to value ratio
> Proofing correctness (in general) is incomputible (halting problem and stuffs).
> Those "prove" you see in academic paper are very specific case.
Not to put too fine a point on it, but every program written is a very specific case, so I'm not sure this is such a convincing point.
As you say, there is absolutely, provably, no general-purpose algorithm out there that will prove program correctness automatically. That is in no way to say that humans can't or shouldn't try to prove some programs correct, or even that you can't have an algorithm that will prove certain classes of programs correct.
With that said, I do also think your parent:
> We have figured it out. People with a PhD and given enough time can do it. Scaling this will be possible, especially with the help of AI, as soon as there is the actual will there to do it.
is too optimistic, and throwing a little cold water on that rosy prediction is appropriate.
Thankfully correctness in the general case doesn't matter in reality, in the same way that it doesn't matter that not all 'correct' programs can be possibly typed.
And I don't know why you put "prove" in scare quotes. There is formally verified software that has been proven to conform to a spec. They aren't just toy programs either, see SEL4 or CompCert for 'real' programs.
We haven't even figured out how to specify what "correctness" means for all but the most trivial examples.
What is the "correct" behavior of that example snippet in the linked post?
I don't know anything about formal correctness proofs but my imagination tells me it is bounded by the description of the task. Aren't we just shifting the bugs toward the task description? Or is it about being able to specify a task with a very simple program (e.g. brute force) that is unfeasible to run but can be somehow used to verify that simpler (in terms of runtime complexity) programs provide the same outputs?
Yes. Figuring out what a program SHOULD do is often as hard as figuring out how to actually do it. The relative difficulty shifts depending how far you are from the real world. The closer your project is to actual people doing actual work, the harder it is to define the requirements precisely.
Sure, if your task description/specification is buggy nothing can save you, but if you only have to check the spec your work gets much easier. If you write a program, you have to both make sure that the spec is correct and that the spec is correctly implemented.
Yes, they can only logically deduce correctness based on our assumptions of a given system boundary. But I think it is typically a good idea to write them down formally if you need the guarantees. They are also typically smaller and more declarative.
If people don't die from your mistakes they're not important IMO.
Note that people die if data is lost, causing companies to go bankrupt (suecide). But really, not everything has to be correct, look at Intel and AMD. They've been compromising correctness for speed for quite awhile, and we're mostly fine.
Depends on the numbers, if my company loses 10k because i made a mistake nobody would "care". Mistakes happen to the best of us, which is obviuous looking at public CVE's. If it's 100k it'd be a different story, considering we're not that big (35 employees). It's just money, it can be replaced, human lives can't.
EDIT: I assume from your comment that you formally verify every line of code in every system you depend on, and run RISC-V to make sure the hardware doesn't tamper with your instructions.
Computer graphics maybe? You can shift goal posts what "correctness" means, but if a game renders something in a funny way, it's really not going to hurt anyone. Yes, if it's constantly a total mess nobody will want to play the game, but 1. that would be (presumably) found quickly during testing, and 2. this is a very gradual tradeoff. Nowhere near "paramount".
Yes absolutely. Take games for example, bugs are absolutely tolerable and to be expected. In fact most software of any kind has known bugs that are simply accepted because the cost to fix them exceeds the value of doing so.
Any domain where the code will be used by a consumer who will go with a mostly correct program that has a fraction of the cost over one proven correct.
I suspect this to be the majority of code written.
>is there a domain where correctness is not paramount ?
pretty much every domain, if you exclude finance & industries that intersect with life and death decisions such as pharma/medicine/airlines/defense.
what’s the correct order of movie recommendation ? its easy to see that given your past netflix history, there is an obviously incorrect order. but there is no obviously correct order - any number of orderings will work. correctness is not paramount.
what’s the correct puzzle to assign to a 1400 on chess.com ? obviously there are hundreds of them that would work. correctness is not paramount.
what’s the “correct price” of a used Ford Focus ? depends on whether you are a bandit who needs the car for a rapid getaway, or whether you are the brother in law of the used car salesman, in which case the correct price is zero usd.
the sole reason why 100 million developers crowd the web programming circuit and not other gatekeeped domains is because correctness is not paramount. whether your color code is off by a shade on this browser or your pixels misaligned on that browser, its all fine so long as it somewhat works. correctness is not paramount. otherwise nothing would ship.
>It's interesting to me that the next few steps of this technology evolution might be that we loose the ability to prove our code correctness forever. Every program will just be a certain - measured by tests - percentage of accuracy against some goal and that's it.
Aren't we already there outside of very specific special applications? At the very least you have to assume every library you are using and the framework you are running on is correct. Sure, that works 99.9% of the time. If your testing framework can get the rest of the code to 99.5% of the time, is the .4% that large of a deal in a case where the other .1% is not?
When we look at what the market wants, what individual people want, do they want things proven correct given the increase in cost it brings? They may say they do, but spending behavior doesn't seem to align.
Unless you never copy/paste from stackoverflow, this argument is kinda moot. Copilot/kite/etc don't intend to replace your brain, it's a shortcur for googling. You are fully expected to understand, adapt and fix the suggestion.
Now I do understand many won't do so. But they already do the same, just slower, with their current method.
Stackoverflow at least has the benefit of whatever comments might be around it. As far as I can tell, the copilot suggestion doesn't have any way to rate it, make a comment, flag it, etc.
I think they kind of do. On the Telemetry section of the Copilot site, they mention that they relay whether a suggestion was accepted or rejected.
I wonder, how much better could Github Copilot become by also looking that the modifications that are subsequently made to the accepted suggestions? Obviously this would go quite a bit further in terms of telemetry, and may become an issue. They would essentially be training based on non-public code at that point.
So it will be self-reinforced to get worse and rot over time? Since surely all code that used Copilot will be slightly worse than code without (extrapolated on the assumption that Copilot is wrong some of the time).
Agreed. How many times have you seen the "accepted" answer on a Stackoverflow be suboptimal if not outright wrong, and a much better answer is lower down the page. For me it feels like 50% of the time or more.
> don't use "but people are already doing this" to justify copilot approach.
Why not? Most of the software is not built by checking the formal verification of specification but by looking into the code and having reasonable understanding that it works. Also there will be errors in the code whether we use copilot or not. Personally if I don't have an option to do a build and run and look into the output, I am reasonably sure that there is at least one bug in something like every 50 lines.
Sometimes I do it blindly too! I don't always care about the "how". Sometimes I'm just looking for a single line to convert a data type, or a bash command to do something. Not interested in learning all the arguments to various bash tools. I just want to delete some files and move on.
> Let's agree that copy/paste from stack overflow is bad
I think context is important. It's not inherently bad. If you are inexperienced or don't know what the code/command is doing then yes that's not ideal.
But competent developers using Stack Overflow (and all other external memory devices) appropriately to aid their workflow is perfectly valid.
I rarely copy/paste verbatim from Stack Overflow (apart from perhaps bash commands where the answer is literally the exact command I need - and in that case, why would I not use it?). But I do copy/paste, as long as I understand the code, and then adjust it accordingly.
In my experience of coaching junior devs. The number one skill I've had to train them in, above all else, is the ability to efficiently search for and find answers to their questions/unknowns quickly. (As well as digest those answers and incorporate the knowledge into their understanding - not just blind copy/paste).
I'd go as far as to say that If you are a developer in 2021 and not constantly looking up things for reference (whether to jog your memory or to quickly learn new concepts) then you're either a genius with a photographic memory, working in a very constrained domain that isn't stretching you or you're just plain doing it wrong. :-)
Very rarely does a command line example map one to one with what I want to accomplish anyway. But also, I can't think of any "long" commands that I would need.
Copy/paste from stackoverflow is a great analogy: Copilot is making something that already has a huge negative* impact on code quality more optimised and easier to do as part of your daily coding flow.
* Just to clarify, I'm not saying SO is bad, but just specifically the practice of blind copy/paste without reasoning about & understanding the code you've copied. SO moderators encourage/semi-enforce descriptive answers for this reason; to add context to the solutions provided.
I think on the whole, stackoverflow has vastly improved overall global code quality.
Even if you just limit it to people largely taking solutions wholesale from SO, I still think that it’s a good jumping off point. Of course it’s a mistake to not make any modifications or look deeper than the whatever is in the snippet, but the snippet is often much better than what a noob would come up with.
Also, it’s an opportunity for learning new patterns that you might not have come up with yourself.
> Even if you just limit it to people largely taking solutions wholesale from SO, I still think that it’s a good [...]
I would respectfully disagree on this point. Anything that perpetuates doing this in any way will always have a negative impact on code quality. If an engineer is copying solutions wholesale, even if those solutions are robust and high-quality, that's an indicator of the approach they have to producing code on a daily basis, which is going to have a much larger overall impact than that 1 answer on SO.
SO is imo a net positive benefit to the community, but only by virtue of them doing other beneficial things that balance out with the harm of copypaste programming. But I don't buy that copypaste programming is benign.
> Also, it’s an opportunity for learning new patterns that you might not have come up with yourself.
Blind copypaste is by definition not an opportunity to learn, because you need to understand (hack/fork/adapt answers given) to learn from them.
> Why do everybody keep repeating "blind" everywhere?
"Blind copypaste" is just slightly more specific thing than just "copypaste". Copypasting code you fully understand and trust is fine (though in practice, is the rarer case). "Blind copypaste" implies you know it works(ish?) and don't care as much about the how.
> Also why all the anti-copilot seems to think their code is great, or even well understood by themself.
My code is certainly not great, but I like to think I understand what I've written pretty well (there are many contributors to poor quality code, not understanding what you're writing is just one potential one).
I also like to think that, while it's not great, it's on average going to be better than code put together by someone who doesn't fully understand what they've written.
> I have counterexamples everywhere around me all the time for these 3 points.
Do you? By what metric do you consider them counterexamples?
> But I don't buy that copypaste programming is benign.
Copypaste programming doesn't have to be benign in order to be better than the likely alternatives. The people who blind copy/paste are likely not producing high quality code in the first place. In which case, blind copy/paste is often an improvement.
I think of StackOverflow as basically crowdsourced documentation. In a perfect world the documentation that comes with your tools would be complete, correct, and readable. Unfortunately in our world this is often not the case and people end up having to do one of three things:
1. Delve into the original source to see what happens (or reverse engineer the binary) -- very time consuming!
2. Guess. Maybe write some test apps to play around with it till you get it working. This used to be very common, but leads to situations like the PS2 encryption algorithm being completely broken because the devs didn't understand the IV parameter.[1]
3. Go on StackExchange and find an example of what you are trying to do, usually with some helpful discussion about the parameters.
[1] You would think security libraries would have the best documentation because it's so important to get it right and difficult for the developer to detect mistakes, but I've found the opposite to be the case. They're some of the worst offenders for just showing you a cryptic function prototype and assuming you know everything about the underlying math already. It feels like the crypto guys think if you couldn't write your own crypto library then you aren't good enough to use theirs, kind of missing the point of libraries.
I honestly never copy-paste from StackOverflow. I go there for general pointers or look at the code snippets and will only use what I think I figured out, and then write code myself.
Even that cannot really protect me from missing things like the security implications that this blog post talks about.
In this case, I probably wouldn't have fallen for the stuff CoPilot (or an equivalent SO answer) suggested, as I learned about these kind of injection issues a long time ago, but there are certainly a lot of areas where I would fail miserably to detect subtle (security) bugs. APIs are full of subtle and often non-obvious pitfalls, and even algorithms can be, where the algorithm seemingly works except for non-obvious edge cases that you might not have considered.
I think the main issue is that many people may not use it that way. When copying from stack overflow, there is some minimum understanding required to adapt the code snippet to match your use of types, variable names etc. With Copilot, there's a high chance you will be able to just accept the autocomplete, and then compile your code.
A tool like this has the potential to help good programmers work more quickly, but it carries the risk of acting as a crutch. A lot of people might never put in the work to become good programmers because they learn how to be just good enough at using Copilot to fake it.
In a world where there are lots and lots of people trying to gain entry in the software industry, there's a major risk of this leading to a lot of incorrect, and even dangerous code making it into production because nobody really had a look at it.
> But they already do the same, just slower, with their current method.
I think that when $PROJECT lets everybody do something at a scale that was before impossible, even if it's the "same thing", it is not "the same thing".
I see comments like this and makes me wonder if it's a hyperbole or am I the weird one?
I admit, there few times I copied but I can count on fingers the number of times I used something from stackoverflow in my entire career (I'm not counting situations when I read somebody's explanation, but didn't use their code, which is just an example of how it works). From posts that I see people write as if they do that daily.
I've used plenty of knowledge found on SO but I've never done a direct copy/paste. I also find it kind of weird how "normal" copy/pasting from SO seems to be. It's always paid off better in my experience to take the extra minute to understand the code I would have otherwise copy/pasted, and then implement what I need from it myself. The time savings from copy/pasting seem to average back out later as technical debt when something inevitably has to be fixed or changed.
I mean sometimes you need the exact code described in a SO post, even if it's one line. So you either look at the screen and manually retype it, or copy and paste.
Copy/pasting does not lead to understanding. The understanding I gain from not copy/pasting allows me to pursue more difficult projects where there isn't a ready-made answer online. If you want to grow, you have to do the work yourself.
And if you want others to grow, make sure your answers give knowledge without giving solutions.
I find it very weird too. I copy and paste a lot from documentation, or GitHub issues, but generally SO questions are very idiosyncratic.
Maybe it's the language you work with?
If I copy and paste React code from SO it'll 100% break my app due to incompatible abstractions, so it doesn't make any sense to do it. However if I need a complex generic Typescript utility type, copying things verbatim is usually fine.
Is this serious? I've never literally copy pasted from stackoverflow. I read the answer and then go write my own code. Did you seriously copy pasted an entire chunk of code and committed it to your codebase after some edits?
If it's exactly what I want, then I'd be ok. It's just that, that never happened to me. The closest I came was, I was checking a regex and realized I missed some cases, so I copy pasted the regex string from SO. I can't imagine a scenario where a >1 line code in SO can be exactly what my codebase requires. Regardless, my methodology isn't going to SO to find code, I go to SO to read the answers, understand then write code. I never thought when people talk about "copy paste from SO" they literally mean like copy someone's code and paste it to your text editor.
A difference between auto-completed code and stackoverflow answers is that the latter comes with explanations and comments, which help us understand the internals of offered answers.
I agree the behavior is similar between copilot and copy/pasting the most popular solution from stackoverflow.
However, you seem to be making the argument that is good behavior. I say it is not. The time saved by automating the googling/copying/pasting is miniscule compared to actually understanding the context in which the code is suited and ill suited. That is the developer's work, and it isn't fed by only the code.
Developing ain't typing. It's solving problems. Most of the time the problem isn't in code. Even when it is in code, there's nuance in how to best solve it that isn't. The idea that AI is useful in understanding the real world context of the problem better than the human (who has the whole context, in code and outside of it) is naive, or disingenuous.
It's very simple: googling and copying should be a method of last resort. Behind even "googling and learning and then writing your own code based on what you learned," for instance. I'd bet I do it less than once a week. (Trying to find something through google and FAILING to find a useful answer is, sadly, much more common for me.)
Copilot makes it a method of FIRST resort.
If I know what I'm doing, and know I should escape my inputs... but copilot barfs up a block copied from someone who didn't... now I have to retrain myself to spend time reading the suggested code for stupid shit like that versus just using my own knowledge in the first place.
That's a far cry from "i know the method name I want has 'string' in it somewhere"-style autocomplete.
You're basically importing a library of "every stupid crap anyone committed to github" and just hoping the the library functions in it are safe and reasonable. That's a crazy dependency to bring into your project if your own programming skills are above that of the median github user.
> That's a crazy dependency to bring into your project if your own programming skills are above that of the median github user.
So, roughly half of programmers would be _better_ served just blindly using Copilot's suggestions then?
Personally, I find that I work with so many different things so often that I "googling" is often much quicker even than reading documentation or even searching my own existing code.
But I also have _zero_ interest in Copilot at all, so what do I know?
I don’t understand. I never have copy pasted from SO after my first few years of work. At that point you normally are already using libs that take care of most things you would copy paste and are hopefully working ok things more complex than just piecing together sporadic SO posts.
The code i produced during my first few years isn’t something anyone should aspire to.
Additionally you likely learned over time that most SO posts are not great but provide a good starting point.
In order for AI to be the future, it needs to be the present first… and of course people will criticize. Let’s give it a chance to evolve. And if you have something to contribute to the movement, then build your own tool.
Of course, when the correctness of your program does not matter in the first place, but just its output, and you happen to be able to judge the quality of the output without knowing anything about your program, then something like Github Copilot makes sense.
So the problem is that Copilot will only work for 99.99% of all the code that's ever written?
The problem is, Copilot will work for 99% of the time you use it - or more like 60%, going by the examples on their front page) - and the remaining 1-40%, it'll give you subtly buggy code.
the remaining 1-40%, it'll give you subtly buggy code
I think that's OK. Most code is subtly bugger no matter where it originated from. It's always going to be up to the developer to check and test what they're committing whether it's from Copilot, a co-worker, or they wrote it themselves. I think Copilot is meant as an aid to the code writing process, not a replacement. It's not really that different to any other tool.
Plus, fortunately, most code is never subtly buggy if you look at small enough parts. If it doesn't work correctly it's very obviously wrong. As Copilot is aimed (for now) at small functions any bugs will manifest in clearly broken outputs. It's when Copilot can generate entire classes, or whole applications, that we'll need to reassess if the code it creates is actually worthwhile.
> Well, let's say you have a system in place where you actually prove the correctness of your program.
> Of course, when the correctness of your program does not matter in the first place, but just its output
1. There are in fact very, very few projects that try to prove their correctness. Usually, those are extremely critical or dependencies for potential critical applications. Even if a project does this they're doing it partially just for keeping ROI sane. Please correct me if I'm wrong but AFAIK, even most programs on aircraft don't use this approach, although they're definitely interested.
2. For most of the projects, the primary goal is to be useful, to be correct is at most secondary.
3. A program written by humans is not inherently correct, it's more likely to be the opposite. No matter it's written by a human or a machine, you should always write test cases to reflect the properties you care in your program. Properly tested programs written by the machine don't make it less correct than those written by a human.
I'm generally interested in Copilot, not because of the novelty it claims to be, but the value it might provide. I see it as a potentially useful tool just like IntelliSense but with greater potential.
At the end of the day, it's an AI "Pair". I've been doing pair programming for years, one of the lessons I learned is one should not assume anything just because he/her pair partner is doing it - both of them should be responsible for their program.
I'm quite excited for copilot. I don't even know that I'd have caught that bug, but I'm sure I could see myself having written it, so it's really no worse - that's why I work on a team with others who likely would have.
I'm not worried about it suggesting subtly wrong code. I write subtly wrong code all day every day, and sometimes I catch it, and sometimes I don't. I write tests, I use types, etc. I'll use the same tools to reduce bugs from copilot that I do for myself.
> Of course, when the correctness of your program does not matter in the first place, but just its output, and you happen to be able to judge the quality of the output without knowing anything about your program...
It's not never. Look at generative art. If you like the output, great. No need to know how the program did it (except maybe for copyright infringement issues :-)). Or if the AI proves the correctness of a program. The output is the proof, I don't need to know how the AI did it. And so on.
> The amount of people who think that GitHub Copilot is a good idea is just frightening. Just shows how many people never thought about the semantics of their programs for even a second.
I think it's an excellent idea. People who copy-paste from Stack Overflow will also copy lots of shitty code or use it incorrectly.
But those of us with a clue, have a neat prototyping system where we can modify the solution until it's coherent. It's still on us. Copilot never claimed to produce the best code. And doesn't have to. That's the whole point behind the "pair programming" metaphor.
Yes, the wrongness is partly associated with the direction these things come from.
OpenAI leadership knows that there is just sooo much value to capture in more automation, even if it takes billions to get there. They also know, that there is sooo much money around not knowing what to do and so much excitement in the field generated by bystanders, not understanding a single thing.
Perfect setting for ride and reap.
I repeat: There is nothing open about this, the research scientists are being used in order to propel just another get rich sooner-than-later scheme (how should they know better, they are innocent scientists). Happy bulldozing what is left of society in order to generate profits and to shove the AI BS down the managerial class, because that's all just how the system works.
And, if know all that, you can exploit it, hacker.
About the only code generation I've ever liked is the pure-convenience stuff, like generating a complete (but empty) pattern matching constructs for whatever algebraic type I'm matching on (e.g. case-of). Otherwise, all I want are good suggestions for code completion.
This debate reminds me of the old days when teachers cautioned against the use of Wikipedia, assuming people would treat it as an authoritative source of information.
These teachers were not wrong about it - people do treat Wikipedia as an authoritative source of information.
What makes Wikipedia work anyway is a conflation of two things:
- Volunteers keeping it correct put in more effort than people trying to sneak in lies on purpose. There are many reasons why this is a case, one of which is, there's really not much to gain trying to mess with most Wiki pages.
- Whether or not a typical person is correct about something doesn't matter much.
- In cases where being correct does matter, it's usually easy to discover you're wrong. Life isn't a quiz show, you don't lose if you answer incorrectly - you get smacked in the face by wrong results, conflicting facts, inconsistent knowledge, and you get to course-correct. You ask around, grab a textbook, and fix your facts.
The second point is big and somewhat sad: it truly does not matter whether or not a random person has an accurate view of the world. Beliefs that have noticeable or immediate impact on one's life tend to be automatically corrected (see point 3) until they're good enough - all the rest of the knowledge serves a single purpose: social grooming. It doesn't matter if a news story, or a piece of trivia, is true - as long as you can tell it to someone and have a nice conversation, it's achieved its purpose. People who care about the truth for the sake of truth are the exception, not the norm.
Back to the topic of GitHub Copilot: code does not serve a social grooming function. Pretty much all code matters at face value, because it directs machines to do things. When Copilot feeds you bad code (or you mindlessly copy stuff from StackOverflow), the result is called a bug. The way reality corrects it is by the system failing, and someone having to debug it. This is expensive, so you want to minimize that.
I generally agree with two reservations. First I find that there are two types of inaccuracies on Wikipedia. Mistakes that I sometimes fix, and bias that is not worth fixing because people trying to sneak in bias are dedicated and have a lot of free time. See e.g. edits made by the user named Erik on this article[1]. They've been persistently making the same edit for years and the only reason we can see it is that they don't obfuscate their activity by using different usernames.
Second, I'm optimistic that most bugs are found before code is even committed, so people will quickly learn that generated code needs to be carefully reviewed. I don't have access to Copilot, but if I did, I presume the way I'd use it is that I'd always comment out the generated code and just use it as a quick reference.
I'm part of a small forum that used to insert small wrong facts in Wikipedia. I think it's basically impossible nowadays but some of them still stand and have been copied in books and articles.
Ad the now-deleted critical sibling comment: Note that OP implied the edits were innocent, likely made by teenagers having a laugh. The misedits, which OP didn't say they made themselves, presumably didn't hurt anyone and taught a lot of people to be critical of what they read.
Yes I should have specified that it was about unimportant and inconsequential things, like the nickname of a variant of a culinary ingredient, coming usually from meta-humor from the forum.
I think this GH Copilot looks really cool but I wonder how many more or less subtle bugs are going to end up in codebases because of it. The snippets it generates are rather large, a tired dev on a deadline will probably not take the time of carefully reviewing them if they seem to mostly work as-is.
Paradoxically I think that the more it'll improve the more dangerous it'll become, because if the Copilot gets it right 99% of the time you're more likely to miss the 1% of the time it doesn't. It's like car autopilots in a way, the better they become the more the driver lowers their guard and the more dangerous they become.
Just imagine how many bad coding practices are going to be promoted to junior developers relying on "helpful" suggestions from this tool which is operating from an AI model and doesn't really understand what the programmer is trying to do.
This whole thing just sounds like a bad idea from the start. Code should be written by humans. If you find yourself repeatedly writing the same boilerplate code, use or create a library, framework, or higher-level programming language, don't rely on an AI tool to attempt to guess what you want.
Copilot can be treated like a smart stackoverflow copy'n'paste mechanism. You still have to choose which variation aligns most closest to your needs and then do a final edit to make it fit the problem at hand.
Unfortunately no, because StackOverflow is like a „best of“ collection of code snippets, while the entire Github will be a long tail of stuff of random quality.
So if we can’t trust an AI to write routine code because it is unsafe, how shall we ever trust it to drive a car where physical security of human beings comes into play?
It only has to be better than the average human. Self driving cars are already there, but GitHub copilot is just an alternative to googling for code snippets rn
Why doesn't the example he criticizes use a plain text request body when it's only a single parameter anyway? And is he seriously recommending HTTP FORM POST data as a best practice to send API request/responses when you're using JavaScript? At the end he even touts the ability to convert the request to JSON (well, technically, to a JavaScript object). But porquoi when you could send an application/json request in the first place?
> And is he seriously recommending HTTP FORM POST data as a best practice to send API request/responses when you're using JavaScript?
I don’t think so? I think sending JSON POST data is a solved problem, so to speak: everyone already knows how to do it. The evidence suggests the same is not true for form data.
> And is he seriously recommending HTTP FORM POST data as a best practice to send API request/responses when you're using JavaScript?
If the API only accepts a URL encoded body then I absolutely recommend sending a URL encoded body. If you're in control of the API endpoint, then you can pick whatever format you want.
An HTML form can send data as application/x-www-form-urlencoded, multipart/form-data, or text/plain (although that's useless in practice). If you're progressively enhancing a form, you might want to gather the data in the same format as would otherwise be submitted. You could send it in that format with JavaScript, or convert it to another format (there's a whole section about that at the end of the article).
I recommend using multipart/form-data if you need to send file data along with other values. You'll have a bad time trying to send this as JSON. This recommendation is right at the end of the article.
> ...if you need to send file data along with other values. You'll have a bad time trying to send this as JSON.
Does anyone think it's ever a good idea to post files and form data all at the same time, to the same endpoint? That right there seems like an exceptionally bad idea.
Typically I would generate signed upload URLs for clients to upload files directly to some bucket and the submitted form data would contain only pointers to those uploads.
Ah, didn't get that context; your clarification is appreciated. Though I'm not sure kids these days care about progressive enhancement-style webdev ...
This mirrors a real-world vulnerability called binary planting that has plagued Windows for years.
You don't actually need to find untrained cases. Using AWS and automated VSC you can retrain existing portions that are already trained. Or farm it out to mechanical turk, like bot farms or captcha farms.
This is a huge can of worms that is being opened by allowing these sorts of random inputs to source code creation - even though there will be filters on that input being used.
I imagine most such companies block by default. Copilot has so many sword-of-Damocles-level potential issues with it WRT licensing and the potential for the exposure of proprietary code that I can’t see a responsible closed source developer, let alone a CTO or CSO allowing it.
Yes, thank you! As someone who has been through painful open source license audits at largish companies, whenever the topic of copy/pasting from StackOverflow comes up, the hair on the back of my neck stands up. When you copy/paste code from a web site, how do you know its license is compatible with your company's software, and you're not exposing your company to legal risk? This product seems to automate that carelessness, unless somehow the code that it produces is licensed in a way that's compatible with every other software license out there.
EDIT: Oh, and it looks like there was a whole previous discussion [1] about this yesterday!
In my testing, given a portion of the GPL licence header, Copilot is quite happy to spit out the rest of it, so I would imagine copilot bans might happen quite fast.
In addition to the license concerns about Copilot’s suggestions, companies probably don’t want their proprietary code being sent off to GitHub servers. Will your code get logged somewhere? Will it be fed back into Copilot’s machine learning algorithms? What if the uploaded code includes some secret passwords?
BTW, it's not only copyleft code that is a problem. "Permissive" licenses also have compliance requirements, such as attribution or preserving the original license text. Only code under Public Domain/CC0-like licenses can be reused without care.
Even that isn't true, at least for pure public domain without a fallback like the CC0. Some jurisdictions (like Germany) don't allow authors to place their works into the public domain.
The likeliest outcome is the one described in Ezra Klein's recent pod episode on AI - most vocations will be helped by AI, but it will still require an operator to QA the results and solve the bigger problem at hand. AI is not magic. You don't just build it once and walk away. It requires constant effort in care and feeding.
In our case, let's not pretend that React will be around forever. Once the new Hotness in the JS world shows up (and i't very easily distracted by shiny things and cool names), you will need to train the models, all over again.
Text like "I'd like to meet you one day & murder you very horribly with a rather large sword" will pass sentiment analysis because everything after the "&" will bypass the check.
It could be client side, where you're using it to hide content from someone who doesn't want to see something potentially abusive.
I did assume the example was client side, but it might not be. The server may be using it to avoid adding a comment to a database if it's abusive. URLSearchParams exists in Node too https://nodejs.org/api/url.html#url_class_urlsearchparams. There are fetch polyfills available, but the plan is to add it to node.
This still seems like a server issue regardless? Using URLSearchParams doesn't mean your server wont see a request with `&sendPasswordTo=hacker`. Sure its nicer on the client side but it doesn't solve the security issue
I came across something like this in a big codebase the other day. Experienced developers weren't paying attention to query string encoding, and this created a huge security problem where two different services could be processing different entites because of parameter injection.
Imagine if I logged in with the email "sdflhasjd@gmail.com&email=admin@service.com". Service A thinks my email is invalid, it gets passed to another service without re-encoding and then Service B thinks I'm a superadmin. Uh oh.
I like how nearly every comment here is about the AI itself instead of POST data injection the blog post warns about. I can only imagine GitHub Copilot is writing them.
Really? I think it happens pretty often. It’s a form of bikeshedding, if you don’t know jack about the actual technical issue in question just blurt out your thoughts on the product (or whatever) the issue is in context of.
The specific error in question is a pretty common kind of error that most of us will have seen at some time – so while it's interesting to highlight, it's not particularly new or surprising.
The Copilot thing is much more intriguing IMO. It is a large and fairly high-profile product launch of a developer tool, and the headline examples given contain a variety of subtle bugs – at least four that are listed in this comment thread alone. That's likely to stimulate some interesting discussion!
Hah, yeah, although the process here wasn't as cynical as you're making out.
I blog about things that catch my interest, and the coding error in Copilot caught my interest. Usually when I do a blog post about the 'right' way to do something, it's triggered by seeing it done 'wrong' somewhere. Usually I can't point to whoever did it wrong, because it's unfair, and as a result I get a lot of replies to the post like "duhhh everyone knows this already". But in this case it's AI creating the error, so it feels fine to point to the source of the error.
Rather than jumping on a hot topic to farm hits to my blog, I'm blogging about something that caught my interest, and it caught my interest because it's a hot topic.
I see no harm or cynicism in doing so, either. I edited my original message to make it clearer.
I'm completely fine with chasing an incidental topic, my comment was in reaction to the parent's comment about people going offtopic, while, to me, it's pretty logical that the topic is going to be the news while it's still hot.
Umm... the article is also about Copilot: it is a massive explanation of why Copilot's generated code is dangerous, using an example put forward on the home page of the project as an egregious example. An article about an obvious injection issue and the myriad ways of doing it correctly wouldn't be worth writing in 2021 without the goal of critiquing Copilot as articles about these issues in web development are a dime a dozen.
The irony. Your sensors seem to be badly miscalibrated.
I've seen moments in the last week where no fewer than three items on the frontpage were dealing in Copilot outrage. URLSearchParams, on the other hand, is brand new. (I wouldn't even be surprised if this were the first time the topic has made it to the frontpage, ever—if it's even been submitted for discussion at all; searches aren't disconfirming this, but it's hard to be conclusive by just filtering on submission titles.)
As for the (obnoxiously stated) claim "Umm... the article is also about Copilot", you're off by about a mile (or maybe off by an amount that we'd expect from a machine that feels like it's doing pattern matching and faking deep, human-level understandig). Copilot's relationship to the article is incidental; it uses Copilot snippets as examples. The article is about URLSearchParams and data encoding.
EDIT: to clarify for others: the issue is the combination of the content-type x-www-form-urlencoded and the body that is being raw-string-injected without encoding. '&' in the body would be interpreted as a URL form token delimiter and parsed improperly by the text-processing.com API.
If this JavaScript executes in browser, it's just a bug. If user types 'abc&d=e', it might not post this text. It's unlikely to be a security issue, because server must check all requests anyway, as user can send anything. If this JavaScript executes in a server (e.g. some kind of microservice call), it might be a security issue, indeed, as some people treat internal calls as trusted ones (although they probably should not).
I'd personally consider it bad code. It uses NaN as an error signalling vector, and makes it an unstated assumption. I suppose it would be acceptable if the project uses magic IEEE754 values / floating point traps this way, and everyone is aware of it.
I don't know Go, but from briefly skimming some articles, I believe a standard practice would be to define the function as:
and then detect the situation when all runs failed, and return an error. Alternatively, there should be a documented assumption that the function expects non-zero successful runs in its argument.
A sufficiently smart ML model could probably do either. This one doesn't, and the problem of NaNs is something I'd have a good chance of missing during code review.
In this case it's worse, because a divide by zero will panic instead of throwing an error.
The "standard" Go response is what you suggest. However, it does force the caller to deal with the error condition. For Go devs, this is standard procedure and they won't mind ;)
However, if the routine is relatively trivial, and it doesn't matter too much if an error occurs (but you don't want it to panic), then handling any errors inside the routine and always returning a valid result is OK.
If this was me, I'd take this second path, and keep the single return value, but catch the special case (float64(len(runs) - failedRuns) == 0) and return zero for that.
Or you could use the panic..recover method and trap the panic and return something sensible. I tend to avoid this, though, because it can trap panics further down the call chain (not in this example, obviously) and you end up with weird bugs that are hard to catch because they're not explicit.
Quietly returning something sensible is also a way to end up with weird bugs, except they are harder to find because it leaves open the possibility that the caller fails to check the return value for this token, and the program keeps humming along. At least log loudly so an attentive developer has a chance of finding the bug. I guess this is now an an old-school practice, but I've always been a believer in throwing an exception on error so that they are totally obvious and you don't ship with them. Crash early, crash often. Find the bug before it hits production.
This is yet another case where they should just make the API do the right thing, rather than rely on programmers to do the correct thing, not than the easy thing.
It’s bad, but junior devs will write exactly that until they are taught better. Presumably co-pilot will learn as well if a certain bit of code it produces is always corrected.
Co-pilot is nothing short of amazing and a force multiplier for senior devs, despite all the denial, I’m afraid.
Sure, makes sense in this case, without any other extra functionality in the class. But there could be more functionality needed. It feels highly subjective to me.
As cool as copilot may look, it does seem like a fundamental problem could be: if people widely use bad practices, and only a small amount of code uses good practices, an AI model will probably suggest bad practices. The AI doesn't know what's good practice, just what people tend to do. Similarly it will probably tend to suggest legacy/outdated code, since less code will be using the newer modern alternatives. I'd guess that's what happened here, and it's a bit embarrassing that their headline marketing demo does exactly that. It may be difficult to mitigate as well, as this will be rife throughout the training data.
I agree in principle, but think it's possibly a good opportunity, to utilize this to create a compendium of industry practices, some of which could then be labeled as anti-patterns.
Could you combine copilot with an updated linter or similar? In Visual Studio the intellisense does a pretty good job of converting my manually-typed old way of doing things code into a newer, simpler version supported by newer releases of C#.
Example:
using (var reader = new StringReader(manyLines))
{
string? item;
do {
item = reader.ReadLine();
Console.WriteLine(item);
} while(item != null);
}
becomes:
using var reader = new StringReader(manyLines);
string? item;
do {
item = reader.ReadLine();
Console.WriteLine(item);
} while(item != null);
The is a big problem with Stack Overflow as well, which causes exactly the same issue.
Questions answered 10 years ago have an accepted answer that was right at the time, but it's no longer the best answer. If a better answer was made 5 years ago, it might have a chance to at least be voted higher by now, but often the current best answer is simply too new and has only a small percentage of votes compared to the others.
In a lot of ways, it's likely to be a self-reinforcing problem, same as SO: someone chooses the recommended code -- which "works" but is not the most efficient, uses deprecated API, or worse has a security vulnerability -- and this trains the algorithm to think it's a good answer and recommend it more.
For what it's worth, this problem predates Stack Overflow, and to some degree Stack Overflow tried to fix it.
Before SO, the typical way people would find answers would be to go to their favorite search engine and type their query, and search engine's heuristics were really bad for this sort of thing. If you were very lucky, you'd get a (current) reference manual, but usually you'd end up with somone new web developer who had just learned something writing a tutorial for others and it was just the blind leading the blind.
I suspect Copilot will be somewhere in-between that and the current SO copy-pasta, with the main downside being that writing bad code is now that much more easier that reviewing it.
Well, yes this is kinda true, but comments help and the ability for others to edit your answer if they have enough karma also helps. Plus a ton of people update their answers to say "Update: My answer was once the right one, but in Python 3.3+ there is a better answer by Michelle below."
What would be cool is if StackOverflow let you choose to move your question down in the ranking to be below someone else. That way the history of the answers is still there, but the more update answer would get the first impression.
I think maybe we can train this further using existing linters and analyzers? At least the AI will emit far fewer lines of critically dangerous code, but we'll still have a lot of anti-pattern issues.
Maybe GPT isn't really a good fit for this kind of task. Maybe we can create a better assistant using simpler AIs. If we reduce the scope (e.g. language, framework) and programming style (e.g. OOP, code organization, design), the amount of context should be much smaller than what GPT is hoarding. This may also allow us to have some open-source programming AIs.
Did you just scroll through the article, saw many different examples and called it awfully complicated? Because the solution is just `new URLSearchParams({ text })` and that’s not complicated at all.
The polyfill below gives a similar example with a caveat of adding a content-type header.
It looks like it does escape at least a few ~( etc with function encode. But there are a billion emojis, unicode whatever to escape?
I've read in the past that browsers have all kinds of URL quirks, and I've seen examples that i've copied into my own code which base64 encode / decode before sending (or GET an image pixel with params).
I don't quite understand why so many people think that GitHub Copilot will somehow cause the downfall of development. As if we don't already have developers copy/pasting code from SO without checking it at all. Whenever I find code snippets on SO I normally comment a link back to where I found it (for better explanation and in case the answer is updated or more discussion happens after I get it) and then I adapt the code to meet our style and quality standards. Using GH Copilot will be no different for me. I appreciate having a baseline of code that I can tweak and bring into "compliance" without having to type it all from scratch or first go search SO. Licensing questions are legitimate but for me this product seems like it will speed up my development and/or introduce me to new patterns I wouldn't have considered before.
Why? Because it glorifies the pattern of copy/pasting.
Your argument goes like: some biking commuters already bike too fast in crowded places, so what harm will it do to incentivise them to put an engine on their bikes so they can go even faster, even on hills?
Pandora's box is opened and all pro copy&paste developers will use it.
Don't get me wrong, nothing against Copilot, as long as you understand the code and you know what you are doing. And yes, most devs did not know that, but they think they are the kings.
It uses js falsyness to figure out whether it can return from the cache or if it needs to invoke the wrapped function. However, js falsy is pretty dangerous. "cache[key]" will return undefined if there's no value in the cache for those arguments, but undefined is not the only falsy value. Here's the full list: https://developer.mozilla.org/en-US/docs/Glossary/Falsy
Many of those values are reasonable function return values meaning your cache will simply not work for some function outputs.
The key generation is also a little problematic. Stringifying the input may produce huge strings which are then kept in memory for an indefinite period of time which creates a memory leak.
Here's the bottom line on git co-pilot. It's a huge step forward and I think everyone is going to be using tools like it in the future. There's no doubt to me that it will make good programmers way more productive. However, not-so-good programmers will become way more destructive since copilot will let them write bad, unoptimized code faster than every before.
Does Copilot learn from you correcting things like this? The landing page says it learns your coding style, but presumably that's just formatting of the same suggestions. If it does something insecure and then you fix it, does it learn the secure way is the right way going forward? Or will it just autocomplete something equally insecure next time you try to do something similar?
> Unescaped text is added into a format with defined encoding.
Meh. This is only a problem if it's unknown text or from an external source like user input. If it's from a known, trusted source, like internal code it's fine.
The cleaning should be done on the server side, anyway, so this objection is moot. Anyone can send any kind of body. Your client is in "enemy territory". Treat everything coming from the client side as potentially dangerous.
If you take this article's advice, you might think you're safe by just using these form data or url encoding. No. Not at all. This will not save you from SQL injection attacks or whatever. Only server-side cleaning will do that.
I think this post was promoted only because it mentions copilot, to be honest. It's not good security
The user agent is... well, the user's agent. It runs code fetched from a trustworthy origin (your server), so it is not enemy territory.
It holds the session keys. It decides what can or cannot happen after the user clicks on a link with some funny URL in an e-mail. It displays and processes data entered by less trustworthy users. If anyone can just make it insert random HTTP headers, this could be a problem.
Yes, the server must assume that enemy agents also exist. But it should better not deliver one to all users.
A 'user' can do all of the things you mentioned, e.g. "insert random HTTP headers", given that they have access to all of the stuff your code does too, so any code, of yours, that runs outside of _your_ systems, _is_ in "enemy territory", as none of the code _inside_ your systems can trust anything from 'outside', even if it possibly came from your code.
You need to explain this better, not just be patronizing towards people who disagree with or misunderstand you. So far, your only explanation is that is possible to bypass "sentiment analysis", which doesn't make sense.
Is not a "narrow view" of security. Your example is not secure, period, which is fine, because that's not where security needs to happen.
> It runs code fetched from a trustworthy origin (your server), so it is not enemy territory.
We should define terms before arguing. Enemy territory is anything you do not directly control. So, as a developer, you do not know if the user's agent is running your code from your server or something compromised. Assume the worst. Anything exiting the user's agent must be cleaned.
> Executing an action against the user's will is a security issue
Non-sequitur. Unless you're saying the `text` parameter could somehow execute code? It can't.
Considering the worst reasonable scenario, that this `text` parameter is sent directly from user input: so what? It may not be great practice, but it's not a security issue. Clean it server-side, which is what should be happening anyway, which the article fails to mention.
Considering the worst unreasonable scenario: the `text` parameter is compromised by a hacker somehow. Well, you're dealing with a far worse situation than could be handled by cleaning input client-side. Better to ensure input is secure... on the server side.
But, maybe I and others here are wrong. Assume many of us do have a worrying misunderstanding of the fundamentals. For the sake of the health of the internet, step us all through this scenario where a secure server side does not save the day, but these methods do.
> Anything exiting the user's agent must be cleaned.
We all agree on that part. What's worrying here is the mentality that, once the server-side has been secured, the client can do whatever. It can be manipulated anyway, so it does not matter for security if it does validation or not.
This is wrong. As a user, I don't care if an attacker has manipulated my data on the client or on the server. As a site owner you are responsible for delivering a secure client.
Yes we should define our terms. The first term we need to define is "security". From the comments here, I'm starting to think people define it as "RCE on the server". That's a rather narrow view.
> step us all through this scenario where a secure server side does not save the day
Back to the example: maybe you're building a chat tool, and it has this sentiment-feature to help with moderation. At the very least, this bug could hide offensive content from a moderator. But you are calling POST to "/api/sentiment/" with untrusted text that can leak into other parts of the request. I haven't done the analysis, but maybe an additional form field could be set, say "learnAsPositive=true"? Or maybe you have some questionable "not a security problem" API design that re-uses the same POST endpoint for multiple things, and you could set "blockUser=true" and control the user name, or moderators can edit the message text. Or maybe it wasn't a sentiment endpoint but something more important, and the untrusted text could be the name of another user.
You say a lot of true things that aren't related to what I said and asked. No one said "the client can do whatever". No one reasonable says that.
> you are calling POST to "/api/sentiment/" with untrusted text that can leak into other parts of the request
Could you expand on this? What does 'leak into other parts of the request' mean?
> Or maybe you have some questionable "not a security problem" API design that re-uses the same POST endpoint for multiple things, and you could set "blockUser=true" and control the user name, or moderators can edit the message text.
Again, this sounds like a server-side problem and bad coding practices. If someone is allowing such things to happen, then the user has worse problems than the dev not formatting their data properly.
The OP's contention was that (pseudocode)
fetch({body:stringFromOutsideTheFunction})
is inherently a security risk and that the methods outlined would protect against the security risk.
I don't need to flog this horse. OP probably got a lot more attention for this article than he's used to. I wish he would at least mention that security must be done on the server-side, but whatever.
I'd like to point out that the concerns you do specifically have, that bad client-side code could lead to the user being fooled in some way is definitely true. I don't see how the example can naturally lead to that, but as I said, I'll let this go. However, one thing you can do to protect your users is to make sure that the server that serves the javascript has a content-security-policy response header, which would prevent XSS attacks. This will prevent altered code from loading in the browser. Read more about it here: https://content-security-policy.com/
Even more basic, everyone should be serving over https://. Those certificates used to be expensive, but now they are free or low-cost. This will encrypt the request/response traffic between your server and the user. More details here: https://letsencrypt.org/
The original code was attempting to set a single form field named "text", while actually allowing that text to control the whole form being submitted.
This is a security risk, because form requests can mix trusted with untrusted inputs. (Trusted: e.g. the action selected from a drop-down. Untrusted: e.g. another user's text or name, or an entity name decoded from the initial URL.)
So, sticking with the "moderation tool" example, you could make a moderator execute an unintended moderation action when they interact with your carefully-crafted username.
The article showed that a piece of vulnerable frontend code was generated. Most commenters instantly dismissed that as an irrelevant concern and instead talked about securing the backend. Yes, you need to secure the backend. You need to protect against XSS. Neither of those fixes the problem that was shown in the code.
> The article showed that a piece of vulnerable frontend code was generated.
The article didn't show anything. It just asserted it.
All right. This is over.
I'd be delighted if at least one of you two could present a coding example that, when run, demonstrates an actual problem that needs to be dealt with.
Declare victory if you must, but if your next response is anything other than example code that, when run, demonstrates the problem, it will be clear to everyone that you guys don't know what you're talking about.
Sorry to be so blunt but there's only so many ways to ask how it's a problem before it's clear there is no real answer.
I'm guessing we have different 'threat models' in mind.
From my perspective, I know _I_ am a moral and ethical person and therefore won't "execute an action against the user's will".
But, also from my perspective, even if "that action is allowed according to the user's credentials", I can't tell, and thus my server-side code can't tell, that a 'user' is a real person or even a legitimate user of my site or app.
The comment I was replying to claimed that "The user agent is ... is not enemy territory.".
But what came to my mind on reading that was user agent's also (commonly) perform 'card testing' and 'credential stuffing' and, even if I trust that I can securely give them access to my front-end/client-side code, I have no way to know whether they're running that code. And, even if they're running my code, there's _still_ room for malicious or nefarious action on their part.
I was NOT disagreeing with this (in the comment to which I was replying):
> Yes, the server must assume that enemy agents also exist. But it should better not deliver one to all users.
I tried to get either of these two to be clear about how precisely this attack works but they reply only with word salad and non-sequiturs (true but irrelevant statements like "one should not deliver enemy agents to users"). I think neither can offer actual code that demonstrates the problem. Given their assertion that "the user agent is not enemy territory because it runs code from your server" I think they're maybe very young or new coders.
Recent evolution of ML based machine translation produce very convincing text for most of the sane input. It looks so natural and pleasant to read... But, it sometimes fails catastrophically like flip negative and positive sentence or completely lost a sentence.
The same things will happens with this tech. You may argue that's where the human programmers comes in to check the code, but can you spot the almost sound code with just one letter of difference that cause the complete failure?
But I fear this is the way we head to. If it does work most of the time, majority of people accept the risk. Because, at certain point, it's indistinguishable from human error rate.
So this is ironic. I've been meaning to port URLSearchParams to python for ages but been putting it off for about 8 months as I knew it would take several hours of thinking + testing etc.
After reading this blog article. I just now used this AI https://6b.eleuther.ai/ to do it in about an hour. Without hardly trying.
I guess it's not going to replace programmers then. Maybe even increase demand as more time has to be spent debugging and fixing problems due to lax standards and growing complexity.
Analogous to Greshaw's law, if two standards are accepted in a organization, then bad standards (those with lesser intrinsic value expressed by time-effort) will replace good standards.
Rather than a tool like Copilot, which does look very interesting, I'd be curious to see the effect of using evergreen (self-updating) off-the-shelf free open-source components.
Imagine if you get get an app that uses `http-rest-api` and `http-www-crud` with `http-www-multifactor` and `http-rest-api-license-key` and you could automatically be using the latest rest API, the latest CRUD framework, the latest multifactor/whatever framework (which you'd probably pin to some standard like YubiKey so user tokens don't get invalidated regularly). The actual reference implementation could be done in pseudocode even, and ported to many languages (ok this is getting too meta). You could throw together a professional quality application in a few lines of package management. And if someone finds a bug in the code, it gets updated, and next time a release is cut you'll pull it down automatically and deploys itself.
reply