Google has used LMs in search for years (just not trendy LLMs), and search is famously optimized to the millisecond. Visa uses LMs to perform fraud detection every time someone makes a transaction, which is also quite latency sensitive. I'm guessing "informed folks" aren't so informed about the broader market.
OpenAI and Anthropic's APIs are obviously not latency-driven. Same with comparable LLM API resellers like Azure. Most people are likely not expecting tight latency SLOs there. That said, chat experiences (esp. voice ones) would probably be even more valuable if they could react in "human time" instead of with few seconds delay.
Integrating specialized hardware that can shave inference to fractions of a second seems like something that could be useful in a variety of latency-sensitive opportunities. Especially if this allows larger language models to be used where traditionally they were too slow.
The latency is still too slow to build LLM products other than chatbots where people expects a delay. The rate limit is also a non starter. And most app ideas involving LLM only differ in how well the UI is done. That’s the differentiator right now in AI apps
LLM is obviously useful for something like Siri, Alexa or Google Assistant, or so you would think.
There doesn't seem to be a rush because it makes the implementation a lot more expensive, and those things are, I suspect, not profitable products (revenue sources) to their respective companies. They are a kind of enhancement to a layer of products and services; people take them for granted now and so you can't take them away.
A smarter Google Assistant would do nothing for Google's bottom line, and in fact it would cost more money to operate.
If it's not done right, it could ruin the experience. For instance, it cannot have worse latency on common queries than the old assistant.
There are a ton of places LLMs are already providing value today. Some of the biggest are turning unstructured data and user intent into structured data, helping with writing (no replacing), certain tasks in software development (it is often much faster to use ChatGPT as a reference or guide than search google and sift through the ever decreasing in quality results).
I'm paying now and want to pay more, if only they would give me API access to the most advanced models. GPT-4 is much better and Google will have a comparable model soon (tm?)
A big part of the business claim lies in the generalizability. LLMs are basis technology that, once trained, have applications in practically all sectors and businesses, because text and data of other modalities are everywhere. Massive scalability potential.
Just to name a very short selection of apps that are very likely to be tranformed by LLMs, or are in the process already:
* chatbots and everything conversational
* QA systems, customer support
* writing and grammar assistants
* Code generation (Copilot etc.)
* Translation
Each of these has a billon dollar market, and you might be able to solve them all with the very same model. That is the bet.
At least from what I've seen and how I've seen others use LLMs, the general consensus seems to be that they're useful for the basics today but are more of a promising tech than something that's already landed.
If OpenAI features were to freeze at what we have today I would be surprised if the company stayed around without a major pivot.
Again I'm in no on way saying this is actually the case, only a hypothetical since the tech is still very new and we don't know what we don't know.
The impossibility of cost + latency analysis for LLMs
The LLM application world is moving so fast that any cost + latency analysis is bound to go outdated quickly. Matt Ross, a senior manager of applied research at Scribd, told me that the estimated API cost for his use cases has gone down two orders of magnitude over the last 6 months. Latency has significantly decreased as well. Similarly, many teams have told me they feel like they have to do the feasibility estimation and buy (using paid APIs) vs. build (using open source models) decision every week.
I'm (the author) actually in agreement with you. LLMs are going to be a big part of search in the future. I alluded to that I'm the post. I'm less convinced about search as a chat interface. But LLMs for query understanding, ranking, etc.? Of course.
The main issue is that its very slow and expensive to browse the internet like this. The LLM will only perform well if you have it do chain of thought reasoning, and that has a latency hit because of a longer generation.
Well for LLM services that do what they currently do google may have an advantage, but all this stuff is still only experimentation with the goal being hopefully much more advanced things, like almost-agi agents. If this happens then no one will care about the way we currently use LLMs anymore.
I don't really agree with the comparison to Web3. LLMs have real world use cases as an interface. It might not replace most current software/systems/processes but act as an interface to these, especially voice. This alone has excellent value proposition and could improve productivity.
Use cases I envision:
- Customer service automation (much better than the shit we have today).
- Tutoring services (won't replace tutors but as an aid).
- Conversational assistant.
- Marketing/SEO.
- Search enhancement.
- Office productivity assistance (debugging, idea generation, search, etc).
All of these are use cases that can generate money, unlike Web3.
It sometimes feels like I've taken crazy pills watching what was effectively a tech demo that went viral become the usecase now dictating billions of dollars of development and optimization.
It's a crappy usecase. And much better ones are typically being overlooked outside a few smart enterprise integrations.
To put it mildly - if someone wants to use LLMs to build a factual chatbot, they should probably just start mining crypto instead, as they'll waste less money on jumping on a trend. But if they think a bit about how LLMs can be used in nearly any other situation, they'll be miles ahead of the majority chasing this gold rush.
Yeah it's hard to predict where the market will go.
It's possible that those forces are enough, but llm adoption at major institutions is slow. Everyone is interested in using chatgpt, but there isn't a clear beat use cases yet, or established paradigm to how it should be used.
OpenAI and Anthropic's APIs are obviously not latency-driven. Same with comparable LLM API resellers like Azure. Most people are likely not expecting tight latency SLOs there. That said, chat experiences (esp. voice ones) would probably be even more valuable if they could react in "human time" instead of with few seconds delay.
Integrating specialized hardware that can shave inference to fractions of a second seems like something that could be useful in a variety of latency-sensitive opportunities. Especially if this allows larger language models to be used where traditionally they were too slow.
reply