Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Sure, it's not trivial, but it is not hard in comparison to the work done to create GPT-4 itself. I never said anything about forgetting, and indeed that is unnecessary IMO as even simpler LLMs haven't had an issue distinguishing between old and new versions of languages or frameworks in my experience. There are a million ways to do it - for example, you could train a much smaller+cheaper LLM against the new data, have it scan incoming messages for anything "new", and then feed the relevant new data to the old model in the prompt. You could make the new data available to the old model as an API.

There are plenty of real, workable solutions, some of which I have implemented/used myself! - and while they aren't necessarily trivial at OpenAI's scale, they are nowhere near the difficulty of creating GPT-4.



sort by: page size:

The idea is someone would first make a prompt for GPT4 that outputs GPT5 enabled prompts. You would initialize GPT4 with it, and then speak to GPT4 to compile prompts to GPT5 context which then gets fed to GPT5.

Although you may know about LLMs, you might specialize in speaking to specific models and know how to get optimal results based on their nuances.


I understand your frustration but I suspect you are interfacing with some of the older style systems and not one built on top of GPT4.

I think in cases of new LLM implementations you can and should.


Maybe? That assumes you can transition from one version of the code to another. And if GPT is going to do that for you, it seems like it's going to need a lot more context than it can currently hold on to. Like, a few orders of magnitude more? I guess that would depend on the problem size.

A lot of people here haven't integrated GPT into a customer facing production system, and it shows

gpt-4, gpt-4-turbo, and gpt-4o are not the same models. They are mostly close enough when you have a human in the loop, and loose constraints. But if you are building systems off of the (already fragile) prompt based output, you will have to go through a very manual process of tuning your prompts to get the same/similar output out of the new model. It will break in weird ways that makes you feel like you are trying to nail Jello to a tree

There are software tools/services that help with this, and a ton more that merely promise to, but most of the tooling around LLMs these days gives the illusion of a reliable tool rather than results of one. It's the early days of the gold rush still, and every one wants to be seen as one of the first


This is an interesting article, and a bit of a mish mash of UI conventions, application use ideas for GPT and actual patterns for LLMs. I really do miss Martin Fowler's actual take on these things, but using his name as some sort of gestalt brain for Thoughtworks works too.

It still feels like a bit of a Wild West for patterns in this area as yet, with a lot of people trying lots of things and it might be too soon for defining terms. A useful resource is still things like the OpenAI Cookbook, that is a decent collection of a lot of the things in this article but with a more implementation bent.[1]

The area that seems to get a lot of idea duplication currently is in providing either a 'session' or a longer term context for GPT, be it with embeddings or rolling prompts for these apps. The use of vector search and embedded chunks is something that seems to be missing so far from vendors like OpenAI, and you can't help but wonder that they'll move it behind their API eventually with a 'session id' in the end. I think that was mentioned as on their roadmap for this year too. The lack of GPT-4 fine tuning options just seems to push people more to look at the Pinecone, Weaviates etc stores and chaining up their own sequences to achieve some sort of memory.

I've implemented features with GPT-4 and functions and so far it's feeling useful for 'data model' like use (where you're bringing json into the prompt about a domain noun, e.g. 'Tasks') but is pretty hairy when it comes to pure functions - the tuning they've done to get it to pick which function and which parameters to use is still hard going to get right, which means there doesn't feel like a lot of trust that it is going to be usable. It's like there needs to be a set of patterns or categories for 'business apps' that are heavily siloed into just a subset of available functions it can work with, making it more task-specific rather than as a general chat agent we see a lot of. The difference in approach between LangChain's Chain of Thought pattern and just using OpenAI functions is sort of up in the air as well. Like I said, it still all feels like we're in wild west times, at least as an app developer.

[1] https://github.com/openai/openai-cookbook


This kind of second-order (and higher-order) usage of LLMs is where things actually start to get much more interesting. The other thing you can do is just train a better model.

I use GPT-4 for debugging a lot now, because it's excellent at taking nothing other than an error message from the console and giving me back what's wrong and how to fix it. It's not perfect, but it's good enough that I reach for it by default now. I don't have API access to GPT-4 yet, and so I was comparing how well GPT-3.5 performed at this same task and for the example I tried, it just didn't get close enough for me to truly find it useful, so I wouldn't begin to rely on it in my daily workflow unlike GPT-4.

But... what I am actually quite interested in, and what I'm seeing a lot of, is exactly how far can you push a less capable model through prompt engineering? I think it's actually surprisingly further than you might have initially thought.


I don't, unfortunately. I haven't had the need to try it yet, but I would experiment with it if I was exploring a new codebase.

There are some proprietary services that interface against GPT-3/4[1], though if privacy is a concern, it's getting increasingly easier to self-host LLMs. It seems like every day there's a new open source development that makes these tools more accessible. Just in the last few weeks we've seen llama.cpp, Alpaca, Dalai... New projects are appearing so frequently that it's hard to keep up. Their token size and quality is not yet at the GPT-4 level, so you can't analyze entire codebases yet, but they'd still be useful for smaller chunks of code. And at this pace we can expect great improvements soon.

[1]: https://news.ycombinator.com/item?id=35313506


I feel it was worse than gpt4 at coding at a low level (getting syntax right, forgetting small details, etc)

But I can write code so those can be fixed. At a higher level it's OK, but the most valuable thing is being able to have my codebase in its context. No other public LLM currently as far as I know can do that.


I'm working on an LLM app which extends GPT-4's code generation abilities to modern libraries :)

damn right, I tried using GPT-4 for to make a text editor enhanced with CodeMirror 6, but the model kept confusing with version 5, and the thing is, they moved stuff around and deleted some modules, so it was a total mess

when you refactor the whole framework you should change its name to keep docs straight, a mere version change is not strong enough for LLMs


Exactly, GPT 3 is technically obsolete. But it's what you get with chat gpt. Unless you are paying for it and know how to ask for access to gpt-4.

A lot of the criticism (almost all of it) you read on the quality of various language models is people trying out chat gpt without doing that. And while it's alright, it indeed has many flaws. Which a lot of people are quick to point out before they give up.

Relative to that gpt-4 is trained on more data, more languages, less inclined to hallucinate (it still does it sometimes but a lot less), and if you know how to access this, it knows how to use tools via plugins.

After you figure out how to access gpt-4 properly, it mostly boils down to knowing what to ask and actually thinking of asking it to begin with. And then you need to follow up with more questions to refine it, etc. Shit in, shit out principle basically. Using it properly is actually a lot of work and it only makes sense if your need is big enough. It's like every other tool really. Having the tool doesn't make you magically better until you learn how to use it properly. Asking naive questions to which you already know the answers is not a productive way to use it, or learn how to use it. It gets better when you step outside your comfort zone and ask it the things you don't know that you need to know. The more specific your request, the more helpful it gets.

A big limitation is actually the UX. Chat is very accessible but not necessarily very user friendly. In a way it was a happy accident for openai that it works so well. But there are tools and extensions that provide you a better experience. For example with code gpt (configured to use gpt-4), you can get it to critique code selections, suggest improvements, documentation, etc. All you need to do is ask it to. With gpt for docs, you can select bits of text and ask it to improve it, critique it, expand on it, suggest counter arguments, additional arguments, supporting facts, translate it, simplify it, etc. A good writer will be able to ask better questions and get better results. The more text you select, the slower it gets. Use it to brainstorm, explore topics, refine, etc.


I see that you got some responses from people who may have not even used gpt-4 as a coding assistant, but I absolutely agree with you. A larger context window, a framework like Aider, and slightly-better tooling so the AI can do renames and other high-level actions without having to provide the entire changeset as patches, and tests. Lots of tests. Then you can just run the migration 15 times, pick from the one which passes all the tests... run another integration pass to merge ideas from the other runs, rinse and repeat. Of course the outer loops will themselves be automated.

The trick to this is continuous iteration and feedback. It's remarkable how far I've gotten with GPT using these simple primitives and I know I'm not the only one.


Memory and finetuning. If it was easy to insert a framework/documentation into GPT4 (the only model capable of complex software development so far in my experience), it would be easy to create big complex software. The problem is that currently the memory/context management needs to be done all by the side of the LLM interaction (RAG). If it was easy to offload part of this context management on each interaction to a global state/memory, it would be trivial to create quality software with tens of thousands of LoCs.

I wonder how it works under the hood.

I'm sure there's an LLM somewhere, but is it as simple as a (very specific, elaborate) prompt for each service run through GPT4, or something more specific... like breaking it up with actual code and running the reconstructed bits through a finetuned LLM?


https://flowch.ai is live and is a hit with early users. As users supply their own prompts regressions in LLMs aren't really an issue, we're currently using both GPT3.5 and GPT-4, both have their place.

Rapidly moved from demos to people actively using it for their day to day work and automation :)

We're taking advantage of new features as they become available, such as OpenAI's functions and larger context windows. Things are evolving quickly!

(and no, we're not using LangChain!)

Simple example of a GPT4 generated report using custom data: https://flowch.ai/shared/73523ec6-4d1d-48a4-bb16-4e9cc01adf1...

A summary of the conversation so far, adding the text of this comment thread to the system: https://flowch.ai/shared/95dd82d1-39d4-4750-b9df-ab2b83cf7ed...


What are you on about? This is exactly what LLMs like GPT-3 or GPT-4 can and will solve. It just takes some time. But the capability to understand, reason about and execute via API calls such simple instructions has absolutely been demonstrated. Getting to a shipped product takes longer of course.

I recently switched roles to a "data engineer" and had to pick up on many new tools I had no experience with (k8s, helm, Victoria Metrics, grafana, and a few others). In the past I would have spent probably 1+ year using these tools in inefficient or outright incorrect ways while I struggled to get a practitioner's understanding of how everything works.

Now I've developed a prompt that I think gives very good results for pair programming and iterative debugging. I discus almost everything I learn related to these tools with gpt4 to confirm my understanding is correct, and also use it for generating yaml or templating other programs.

In some ways I am a little weary of how much I use the tool since OpenAI can theoretically take it away at any time. I am heartened by the rapid development of other open models (phind code llama seems very interesting), but will continue to use GPT4 for now as its indisputably the best model out there.


No, your test was great, very well-conceived to trip up an LLM (or me), and it'll be the first thing I try when ChatGPT5 comes out.

You can't throw GPT4 off-balance just by changing the object names or roles -- and I agree that would have been sufficient in earlier versions -- but it has no idea how to recognize a cycle that renders the problem unsolvable. That's an interesting limitation.


> To test GPT-4's ability to generate a complex program, ...

I wonder how much the complexity of the various ecosystems we find ourselves in contribute to the lack of effectiveness of the language model. The task at hand really shouldn't be considered complex. Making and registering a command to do this in Emacs is essentially (defun inc-heading () (interactive) (save-excursion (search-backward-regexp "^\\# ") (insert "#"))). No project structure, no dependencies, no tooling: something a LLM should have no problem doing.

next

Legal | privacy