Well, technically it is a MITM proxy for a locally hosted LLM, but people these days prefer simple, catchy names...
But it might be useful if, say, you have a local GPU-powered machine on your LAN. I just wish they weren't using the advanced settings in the CoPilot extension and were using, say, one of the many OpenAI-powered alternatives (like Genie) -- would feel less like a hack and more like an alternative.
Yeah. My main problem with this is that the Copilot extension is proprietary (or at least it was – maybe that has changed?) and you can't install it on VSCodium, for example.
I tried pretty much all of them with Continue in VSCode, and it's a bit hit and miss, but the main difference is the way the workflows work (Copilot is mostly line completion, Continue is mostly chat or patches). So the main value add here for me would be a more Copilot-like workflow (which seems to align better with the day-to-day experience I has so far).
This is interesting because I started using continue specifically because it doesn't just do tab completion. I have normal Github Copilot turned on again as well now after having it off for a while and the tab completion does seem to be better than it was a few months ago. Still, explicitly asking continue to write a diff has a higher 'this is correct' rate for me because I can describe what I want in detail.
Tab Completion seems fine for cases where I add one new line and the next two lines are just incredibly obvious. I am going to experiment with writing a little comment first to see if it primes the tab completion to do something non-obvious.
Maybe it depends on the work people are doing. I am doing a lot of data engineering where I can be often tabbing through while adjusting bits that are off, or doing Python portion by portion, and depending on the complexity it does go from "finish this string I'm writing" to few lines worth of code.
I'm often surprised that how quickly the model discovers reasonable patterns (even across files as well, that's often necessary to be correct).
With the diffs I find that by the time I describe things in a way it has a chance of working, I might have just written the whole thing myself. Especially as the diffs are often need correctly. With tabs that correction is part of a fast feedback look, with diffs it's so far a slower loop and just more awkward.
Of course, with changes to workflows one or the other can shine and if there's an interface that's faster than typing the instructions out that might just supercharge things for the diff/chat type too.
You should check-out https://github.com/smallcloudai/refact. It has both autocomplete and chat. It's in active development, with lots of new features coming soon (context search, fine-tuning for larger models, etc)
Have you tried Continue with Phind-CodeLlama-34B-v2? I've been waiting to explore local alternatives to my current go-to tools (Cursor + GPT4) but it seems that local LLMs quality isn't there just yet... ?
I haven't tried that particular one yet, I might not have enough RAM to run it, though, but gonna give it a try. It seems like one of the larger code-focused local models. Cheers for the suggestion!
Be careful with Continue. It has opt-out telemetry and by default it seems to send all prompts to the developers. Have a look at ~/.continue/telemetry.log to see what is being sent. Don't enter any passwords, credentials or other sensitive data in prompts unless telemetry is disabled.
I'd really love someone to explain how it's confusing: As the parent says, it's dead obvious that it's not going to be GPT-4 running on your Macbook, the title starts by naming it something else, it aims for the very specific style of completion of Github Copilot...
Normally these projects hijack the word 'local' by sending all of your data to a 3rd party API and that is confusing... but we finally get one that runs the model locally, does what it says on the tin, and some people still find a way to paint it as deception?
It’s confusing because it says “GitHub copilot running locally on your mac” when it is actually Llama running locally on your mac, but with the copilot interface.
So the confusion is because what it says it is explicitly different to what it actually is, which understandably will be confusing.
Seems to be running on llama.cpp, so it's going to be a question of performance. I don't have any M-cpu but on my 13th gen i5 I can run mistral at about 6.5 tokens per second. Which seems comparable to what this is.
It's because of a quirk of the hardware. The Unified Memory setup that Apple built into the M2 (and M1? not sure) systems means that you have high bandwidth and large amounts of memory available to the GPU inside them, which lets you run LLMs very efficiently, and very easily. Combined with the fact that that part of the hardware is the same across all the machines it makes for a very easy target to get a lot built with minimal effort.
The matrix multiplication hardware and data types aren’t standardized yet.
Also, the M1 Max has more bandwidth than an epyc Milan to actually feed all of that. It’s about the same bandwidth as a PS5, but in a mobile package, with none of the latency of GDDR6. Much more powerful than a standard dual-channel consumer cpu.
Hm… the q4 34B code llama (which is used here) performs quite poorly in my experience.
Using a high quantised larger model gives you an unrealistic impression that smaller models and larger models are roughly equivalently capable… but it’s a trade off. The larger codellama model is categorically better, if you don’t lobotomise it.
It’d be better if instead of making opinionated choices (which aren’t great) it guided you on how to select an appropriate model…
I found the fact Copilot is close sourced -- not just the model, but even the plugins are close sourced -- is very worrying. Good to see efforts on the alternatives.
It's worrying because it feels like MS will add more and more closed source "features" to VSCode and undermine its open source-ness.
It also prevent other editors' users from building Copilot plugins. For example, there won't be a Copilot plugin for Emacs that can be accepted by Emacs's official repostiory.
> Visual Studio being closed source
If VScode isn't the de facto universal editor accepted by every programming language's community (notice that even this particular thread is about a VSCode plugin!), I won't be so worried.
I tried to use Cody and the UI was confusing and the completions were unbearably slow. I was using the Rider plugin.
I want Cody to work for me, but right now it doesn't. I really, really want whole project awareness (maybe even to leverage a concurrently running language server?) for my completions.
Do you guys have a usage tutorial or a video somewhere? Are you flexible with how your UI is being implemented (ie, can I pitch you ideas)?
I'm sorry to hear that. We have made a lot of improvements to Cody recently. We had a big release on Oct 4 that significantly decreased latency while improving completion quality. You can read all about it here: https://about.sourcegraph.com/blog/feature-release-october-2...
We love feedback and ideas as well, and like I said are constantly iterating on the UI to improve it. I'm actually wrapping up a blog post on how to better leverage Cody w/ VS Studio, that'll be out either later today or sometime tomorrow. As far as feedback though: https://github.com/sourcegraph/cody/discussions/new?category... would be the place to share ideas :)
We have agreements with the LLMs we work with to not store your prompt data or AI generated responses, or use it for training purposes. Data is only used to generate the response and then deleted.
On Sourcegraph side, we do collect some telemetry to improve our products, and for enterprise use cases we can def work with you on what data Sourcegraph collects/stores and how it interfaces. For example, we recently added support for AWS Bedrock so you can run your own instance of the LLM and connect it. So we def have options we can explore with you.
Tried using this; has great potential but it is too rough right now, an alpha at best
- The native cody app makes me sign in everyday for some reason
- the pycharm plugin says I don’t have embeddings but the native app claims otherwise
- when indexing, it complains it cannot find the repo (i assume it is trying to fetch from remote, which is a private github, and not local disk) - i worked around this by removing the remote entirely from git but that is only a temporary solution
- i cannot choose the branch to index (i work on feature branches)
Thanks, I don't know why I didn't think of that.
Just tried it and it works at least with the llama-cpp-python server, but I suppose some other servers and very likely hosted services (SNI, certificate subjects, ...) would have problems with that.
How is context-from-following-tokens implemented in the pure OAI API? I assumed there must be specialized models that have two separate context windows.
Looks cool! Always like to see these local alternatives. I'm a Sublime Text user (it is still amazing!) so there aren't many options for LLM assistants. The only one I found that works for me on Sublime is https://codeium.com/ and it is also free for the basic usage.
They have a great list of supported editors:
- Android Studio
- Chrome (Colab, Jupyter, Databricks and Deepnote, JSFiddle, Codepen, Codeshare, and StackBlitz)
- CLion
- Databricks
- Deepnote
- Eclipse
- Emacs
- GoLand
- Google Colab
- IntelliJ
- JetBrains
- Jupyter Notebook
- Neovim
- PhpStorm
- PyCharm
- Sublime Text
- Vim
- Visual Studio
- Visual Studio Code
- WebStorm
- Xcode
I have found that the completions are decent enough. I do find that sometimes the completion suggestions are too aggressive and try to complete more than I want so I end up leaving it off until I feel like I could use it.
It isn’t the code completion assistant like the one you mentioned above, and it probably never will be. I see it more as a perfect coding companion, that is always under your fingertips and relieves you of googling most of the times.
Yet it’s tied with OpenAI, and you have to pay it by yourself, but the former should be changed rather sooner than later.
Bonus: in develop branch there is some-kind-of release candidate that a way more robust that the current release is.
3) HuggingFace inference framework. https://github.com/huggingface/text-generation-inference At least when I tested you couldn't use something like llama.cpp or exllama with the llm-ls, so you need to break out the heavy duty badboy HuggingFace inference server. Just config and run. Now config and run llm-ls.
4) Okay, I mean you need an editor. I just tried nvim, and this was a few weeks ago, so there may be better support. My expereicen was that is was full honest to god copilot. The CodeLlama models are known to be quite good for its size. The FIM part is great. Boilerplace works so much easier with the surrounding context. I'd like to see more models released that can work this way.
I would love to be able to take a base model and fine-tune it on a handful of hand picked repositories that are A) in a specific language I want to use and B) stylistically similar to how I want to write code.
I’m not sure how possible that is to do, but I hope we can get there at some point.
reply