It's not a lot faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents.
When used for business logic they also execute about 20x faster than the same logic encoded in a client, and in far fewer LOC. Getting rid of all those round-trips has a huge effect!
It's definitely not objectively better. For normal workflows, the startup latency is a real hindrance, even if the community has collectively built norms around that which help a bit (keeping a long running repl session open, etc.). But in terms of expressiveness plus attainable performance it really is hard to beat. I think it's nice that the path from quick draft to really performant code is continuous, and not a big gap like switching languages.
If you can submit what you want and have it within moments, it beats a fancy typing pool. That kind of speed alone lets you have a highly trained someone retry different variations and filter more on quality of result.
Scalability. If you need to do it ten times faster. More hardware = done. More people could make it done too but training etc is required.
I think the most important gain is that client requests can fit in fewer packets -> fewer RTTs (critical on poor networks) -> lower latency to first paint -> better UX.
reply