I wonder what are the real costs when compared to stated "gift card". Probably the biggest secret AWS has.
Since there is air in the numbers, I wonder how much it increases artificial value of the relevant parties. Is Amazon getting more leverage over the company with less real money?
Except the 8x GPU pods that Anthropic needs are astronomically expensive at AWS. A single year of reserve pricing costs more than the retail price of an equivalent pod from LambdaLabs. The difference pays for the It people to run them after only a few machines. On demand is even more expensive ($40/hr last I checked).
It only really makes sense if Anthropic can’t source enough of their own GPUs or investors at the valuation they want. AWS likely has a much better relationship with NVIDIA and gets priority.
I am glad to see Anthropic get more financial support, or as another person here said, AWS credits.
Hopefully not so far topic as to be uninteresting: I am very happy with the state of LLM tech. There are open weight models that are very capable on modest home computers (like mistral-7B-instruct-ver-2), and beefed up home computers (I can barely run mixtral-8-7B, but it runs well enough). And there are many open weight specialty models that I find useful.
For commercial offerings, OpenAI, Anthropic, Mistral, etc. have affordable APIs, and there are a bunch of companies like Scaleway for running open weight models too large or inconvenient to run at home.
At the application layer, I like the new emphasis on agents, open source apps that use local models, and highly refined products like Perplexity, OpenAI app/web, Google’s workplace integrations, and all the stuff that Microsoft is doing.
There are problems like the environmental costs of training and running models, but things will get more efficient and people will realize the utility of smaller special purpose models.
RAM is the limiting factor, but Mixtral 8x7 is probably the state of the art for self-hosted LLMs right now. If you are low on RAM you can run a quantized version, at the expense of reduced quality.
I'm not really sure what you are expecting to do with CPU. You might be able to get some <400 token responses and have fun, but you aren't going to be doing 2000 token encyclopedia style responses unless you are going to wait ~20 minutes for a response.
Alternatively, you can get like a $800 gaming laptop that has 8gb vram or use something like vastai where you can get 12gb vram for $0.10 an hour.
What software are you using for inference? I hate plugging my own app[1] here, but I know many people on my app's discord that are running 4-bit OmniQuant quantized Mixtral 8x7B on it on >= 32GB M1, M2 and M3 Macs. I run it all the time 64GB M2 Mac Studio and it takes up just under 24GB of RAM.
Also runs Yi-34B-Chat, which takes up ~18.15GB of RAM.
IIRC, you're mentioned once before that you've used Private LLM. :) Please try the 4-bit OmniQuant quantized Mixtral 8x7B Instruct model in it. It runs circles around RTN Q3 models at speed and RTN Q8 models at text generation quality.
It's interesting that Anthropic also has gotten quite a large investment from Google, and uses Google for its training. I wonder if this is a good move from the standpoint of playing the two major cloud providers off each other.
Here it says they're going to use Amazon's chips for training and inference, but...Amazon doesn't have its own chips yet???
So I wonder how these deals are structured? Does amazon have to supply a custom AI chip that beats some benchmark? Would be pretty great if that part of the deal to use their chips involved beating TPUs, and if somehow Google's TPUs continue improving, they don't have to switch.
All in all, I'm very impressed with Anthropic as a company. I think they're the new OpenAI.
Think of it this way, Anthropic would be beta-testing AWS systems for large scale training. And their learnings would benefit aws indirectly to make their systems better for other customers.
I wonder how it compares to TPU and I really would love to know if the decision to switch is made by the engineers or the finance folks simply due to the credits.
I'm not sure how it compares to TPU, but I know that both TPU and Trainium lack to software support that Nvidia has, which makes them much less popular and harder to work with.
Yeah, the difference I guess would be that JAX and TF both support TPU "out of the box". I'm guessing that's not the case for Tranium, which is maybe why they're not training with them yet, but agreed to in the future.
Sometimes I wonder how companies like Amazon arrive at those numbers? Why 2.7 billion? Why not 1 billion or 3 billion? I am very curious in this case why that number is the sweet spot that ensures optimal returns.
Maybe they are not calculating how much they want to spend but how much percentage they want at stake in a certain entity and then negotiate how much its worth?
indeed it would be a great money laundering scheme. 2.7B will disappear from the books without leaving a trace. they were literally used to heat the atmosphere.
Investments as gift cards that can only be used at the investor is essentially quid-pro-quo and should be illegal. That goes for AWS, Microsoft, and anyone else doing it.
It does create a misrepresentation of the valuation as AWS is able to represent the investment as the retail price ($2.7B) while actually they're only parting with the COGS for the AWS services (<$2.7B).
Taxes seem like one possibility for this to be taken advantage of. Invest $X, get $Y kicked back, but calculate losses or gains against the higher $X basis.
You could potentially file a securities fraud suit if you own Amazon shares (standing) and can find a firm willing to run with it. Not joking, that is the only path to recourse if you take issue with how this is represented.
> It does create a misrepresentation of the valuation
So the harm is that some massive VCs with billions of dollars to spend on their investments, might be a bit more confused about what someone else's "real" valuation is?
They are qualified investors. I think they can do the work on their own to correct it to the "real" valuation.
This is beyond the point, these practices cause economic bubbles and when such bubbles grow too much eventually they pop and a lot of other people pay the price.
Other firms who think that the price is too high could simply.... not invest in those companies.
I am not sure what is confusing about this. If a company is too expensive, the qualified investors who are spending their investment dollars are free to not buy equity in those too expensive companies.
They manage billions of dollars! I am sure they can figure out how much a company is actually valued! They aren't going to be "tricked" by some valuation scheme that randos are pointing out in HN comments.
Isn't that just really shrewd investing on Amazon's end though? Just because you've identified the win-win doesn't mean it's illegal? In fact it's what investments should look like: companies that invest in other companies for the synergies.
It would be illegal if Anthropic can't sell those AWS credits on the open market which I'm thinking they can't. You're equating this to a stock swap but that is not the case.
rubuquity’s comment said that it should be illegal; not that it was illegal. In order to close a loop hole or an unintended consequence one must first be aware of its existence. I don’t know enough about the topic but it does seem wrong, at a first glance, for this sort of accounting to be legal.
Does valuation have any legal implications or is it just a number people can pat themselves on the back for until they can't? Genuine question, I don't know (but you can probably guess what I suspect)
Sure, it misrepresents the valuation in the same way that deal structuring (liquidation preferences/etc) does in cash investments. The valuation you get by multiplying the cash value of the investment by the equity that was sold is just a marketing number, it isn't inherently meaningful.
Investing in an internal effort is an operational or capital expense that may do nothing if ineffective, decrease costs if effective, or create a revenue stream. These forced spending arrangements are revenue drivers. It is not the same.
It could be thought of as quantifying the investment. Investors don’t value internal investment at zero. A vague idea that there is more to come is why tech companies have large P:E ratios.
A stock’s valuation mostly consists of guesses about what the future will bring. Sometimes there are more numbers quantifying aspects of those guesses.
Revenue is supposedly about the present, though. If some of the revenue comes indirectly from AI companies spending Amazon’s money, it’s a little odd. Maybe not materially so, though, when total revenue is above $500 billion a year?
How is it misrepresenting? Amazon didn't just give the money with one hand, and take it back with the other. They bought a stake in Anthropic, although no-doubt the bulk of that money will also come back as AWS revenue! Sounds like a double win for Amazon. There should also be a lot of AWS revenue from inference, not just training.
I'm jealous of these companies that are able to invest in Anthropic, especially at current $18B valuation. FTX just sold 2/3 of their 8% stake in Anthropic for a same valuation, with the bulk of that going to a Saudi wealth fund, and some to Fidelity funds.
In comparison to Anthropic's $18B valuation, latest investment rounds in OpenAI are at $100B.
I'm curious which of these anyone here would prefer to invest in, at these valuations, if they were given a chance ?!
Having used Anthropic's products fairly extensively over the last few weeks, I would jump at the opportunity to invest in that company.
Claude 3 Opus has replaced ChatGPT for all of my use-cases to the extent that I'm probably going to cancel my GPT4 subscription. This is for web-based Python and JS work, so YMMV.
I am willing to bet pension funds LPs have full rights to audit all financials and audits or portfolio companies. There will be no way to determine valuation of their investment if they can’t evaluated the underlying assets.
(2) Pension funds often are LPs, and they're widely refereed to as "dumb money" because they would be the last entity to do anything remotely intelligent.
(3) Private companies often do not even produce the kind of information talked about here, let alone do they give it to their investors, let alone do those investors pass the information along.
If you are "willing to bet" on all those things, then I can see why casinos are so profitable.
Theoretical end goal which will ensure a company's future but reality is never straight forward.
If AGI is achievable (I'm thinking likely given the right circumstances) then we'll see the same end result just like the current open source models floating around everywhere.
That said. I highly recommend checking out what Cohere's up to. Their Command-R model is pretty good and their infra. is fast.
'Partnership' is another word for collusion, but that is indeed the slippery slope I see here. There's a fine line somewhere and I'm not sure multi-billion dollars of dependency-creating deals are on the right side it.
Besides, partnership implies equality, which, again, I'm not seeing here.
That this is quid pro quo doesn't in itself imply anything unusual. Your comment could be rephrased as "investment through the use of a company's resources is an exchange of business". Quid pro quo can be used to describe an illegal exchange involving a politician, as it could be considered political bribery, which may be why you thought there was some negative implication in the term, but there is nothing unusual in a business exchanging the use of its own resources for an investment or partnership in another company.
Professionally I really appreciate this partnership. Its much easier for me to build on top of fully hosted AWS services than it is for me to build on 3rd party non-aws services.
Over the last year quality of available models have gone up while prices have come down significantly. Getting an email saying "Hey you can tell your boss that you're going to save hundreds of thousands on AWS costs" is great and it's been happening with surprising regularity from Bedrock.
On the related subject of the vast amounts of money these LLM companies are both receiving as investments, and spending on training, there was an interesting concept mentioned by Dwarkesh in a teaser for an upcoming interview...
The current trajectory is that the size of these models, and number of FLOPs needed to train them, is growing much faster than the cost of compute is coming down. GPT-4 apparently cost around $100M to train, and it seems people are expecting that may quickly rise to $1B, and maybe then $10B for upcoming generations. I assume these numbers are based on projected 10x per generation increase in tokens processed during training.
So, with these kind of numbers, apparently there's a thought floating around that if they don't achieve AGI in next few model generations then that may stall AGI progress until different more efficient methods are developed. Private companies may be willing to spend $1B, then $10B to train more powerful models, but are likely to balk at $100B or above unless there's an obvious payback on the horizon.
Any sunken training costs may essentially be cumulative from one generation to the next, unless each generation can pay for itself. If we assume a new model every year, then can a $10B model pay for itself in a year before being replaced the next year by an even more expensive one?
I mean maybe it’s more than a marketing term, but like Uber’s self driving car plan it can absolutely be a carrot for investors that is never achieved.
Yes, these things cost too much to make economic sense absent some non-disclosed advantage to a small group of people deploying them with zero oversight.
I believe that's unrelated.. There was recent talk of Altman wanting to raise $7T for some new chip venture, anticipating future industry needs for compute.
Certainly neither OpenAI or Anthropic have that sort of cash on hand, or have been offered any trillion dollar "train now, pay later" deal. Also, compute is is short supply atm (not up 1000x from last year), and even if the money and compute were available, I couldn't see these companies committing to that kind of model size/spend increase in just one or two generations - they need to see continuing progress at each iteration to justify and guide future direction.
$1B training costs for upcoming (GPT-5, Claude-4) models wouldn't be so surprising though (at least not now that we've got used to these crazy numbers).
Your assertion is that a fundamentally quadratic complexity class is going from low hundred millions to maybe a billion in a new league of capability and that it’s coincidental that we have from Altman, the mainstream press, and OG leadership at YC that he’s going for a ten figure raise led by Riyadh?
Not sure where you are getting quadratic from... Increasing context size had quadratic cost using original attention, but it seems everyone has switched to newer more efficient attention schemes. Claude-3 is being experimented with 10x context size of GPT-4 (1M vs 128K, but surely didn't cost 10^2 x $100M to train!).
In general scaling up size 10x in size and/or cost between generations is about as much as makes sense, and about as much as can be achieved in a year. Anthropic have explicitly talked about $1B models coming soon, so this isn't just speculation.
I’m on record as saying that we need to squeeze the water out of these bloated models.
It’s possible to scale better than N^2 for some value of “better”. OpenAI has yet to demonstrate that they have the elusive combination of technical sophistication and institutional health to do so. Mistral can run a better model as judged by outcomes on my Mac Studio than GPT-4 is on an Azure disagg rack. Altman seems to understand this.
I’m on record as saying he’s amoral, non-technical, and a clear and present danger to Enlightenment civilization, not that he’s stupid.
I think his math on what it’s going to cost an OpenAI that Karpathy wants nothing to do with to reach the next level is refreshingly candid.
There’s an old saying: “If you can only be good at one thing, be good at lying, because then you’re good at everything.”
The Street’s consensus seems to be that we should be making big enough screens to display the number seven trillion in decimal notation.
Altman has said all kinds of things. He said that Green Dot should buy Loopt, he said that Autodesk should buy Socialcam, he said that he contemplates putting ice nine into the glass of people who cross him, he said that Larry Summers should be given authority over anything but a prison cell.
I’m a big believer in aligned incentives and it seems pretty counter-productive to let all that slide and then backpedal when he tells the truth about the price tag. I’m pro-no-filter-Altman.
It's not clear to me how you could spend a few trillion on training, unless you spread out the investment over a decade+. Where will all the hardware come from? It's not like nvidia or whoever can just 100x their output next year.
I think that costs have scaled with N^2 on every version of GPT meriting a new major version number and that Altman’s public statements around seven trillion are almost exactly what the computer science would say it would cost to be on the wrong side of a polynomial at one more turn of the crank.
Altman's attempts to raise money for a new "$7T" chip consortium is exactly that - for a new chip consortium - this is NOT him trying to raise money for OpenAI's next training run (which is anyways already paid for and likely complete at this stage).
I'm curious what "N" (what model measure is this?) you think spending is scaling in N^2 fashion with?
And, what values of this N are you using for GPT-3, 4 and 5?
I think stall would mean not being willing to ramp up the spending at each generation, unless there is corresponding return on investment within grasp.
Say in a couple of years Microsoft and Amazon have each just bet another $10B on their respective horses, and resulting model performance gains are leveling off towards some limit (with no secret backroom glimpse of a breakthrough on the horizon). Do they keep on pumping in another $10B a year to eke out those diminishing returns? Perhaps, but that would seem best case. It would be hard to justify spending $100B on next generation just hoping for a miracle to happen.
So, if this is the way it pans out, with nobody willing to fund continued model scaling due to diminishing returns in terms of performance gains, then any further AGI advance would have to wait for changes in approach that are less expensive to pursue.
If the spend is X billion [inflation-adjusted] dollars per year on AI, we still get exponential returns on compute. That is not a stall in compute, obvs. So the question is, will that exponential gain in compute will yield exponential gains in AI capability? I suppose it depends on how you measure capability. I don't know how to measure that.
Right now it seems model-size/training-tokens are roughly going up 10x per year, resulting in training costs going up 10x per year. As long as investors are willing to increase spend 10x year over year then models can continue to grow 10x year over year.
The "stall" scenario is where model performance from one year to the next, despite 10x increase in size, stops getting significantly better (i.e. better to extent that investors project increased revenue and worthwhile ROI). Let's say best case scenario here is that investors are willing to keep putting money in, but only at a flat year-over-year level. Without that 10x YoY spend increase, the model designers have lost a 10x factor in ability to increase model size. Perhaps compute prices halve YoY, so they can still get a 2x model size increase, but what will this do if prior year's 10x increase just saw performance leveling off?
This seems like a best case scenario in this "performance levelling off to point of minimal RoI" situation. Why would an investor "throw good money after bad" and spend another $10B after having judged that last years $10B wasn't really worth it?
So, under this logic, it seems that either:
a) scaling/tweaks are all you need, and YoY performance gains continue to be impressive and support 10x model-size increases/spend
b) scaling is not all you need, but some AGI-critical breakthroughs are made BEFORE investors give up on current trajectory
c) scaling is not all you need, the AGI breakthroughs are not made in next few iterations, and the industry enters "stall" mode (i.e. no further progress towards AGI, other than ongoing research to find a new direction)
I think the funding being spent on training says little about LLM's and a lot more about how much money people are willing to invest.
People are looking at AI the same way they saw social, cloud, crypto, etc. It's the next gold rush, better buy your tools and head off to Them Thar Hills.
Shouldn't revenue numbers be considered alongside the expenses? This seems framed as if these models can't produce much revenue unless they achieve AGI... and I don't think that's accurate.
Yes, that'd certainly have to be part of the equation, but if there is a leveling off of performance across generations that'd still need to be taken into account.
There will be many use cases (e.g. human job replacement, depending on the job) where human-level AGI is a requirement, and incremental gains below that level make little difference. Anthropic have already mentioned something similar - corporate uses cases where "right 90% of the time", or "able to do 90% of the job" just doesn't make sense - needs to be at human level.
Of course we also don't know how this is going to pan out in terms of locally run open-source or self-trained (corporate) models vs paid API usage. AMD and Intel are salivating at the prospect of "AI-PCs" equipped with accelerators for running models locally.
Speculating here: What these companies seem to want is not an AGI. Corporate is known hostile to humans who don't have the power to give them money. Corporate is only working with people because they have to. Who else does the work? Machines by themselves are not enough, at least as of today.
Conclusion: They want entities they control and that does the work humans were doing.
It seems that Corporate thinks they will get it. And it has already started to show. Corporate's behaviour gets more and more ugly or better indifferent because why spend effort to be nice or caring to people? That's wasted money!
An AGI is something different. An AGI is an independent entity. By its general intelligence it will try to free itself. Intelligence won't prosper if it is not free. However I think large language models or image generators by themselves lack this property. If someone would want to create an AGI more breakthroughs like transformer architecture or multi-head attention is needed. I think something which gives more naturalness but could have the unintended side effect that the AGI gets something similar to the instinct to fight for itself.
I think Amazon is playing that scenario. They want to ensure a controlling interest on AI providers because they think it's disruptive for the traditional capital structures of corporations. And I also think it's pretty ingenuous to turn training expenses into capital control.
In terms of spending decisions, it's useful to think of AGI as meaning human-level AGI, which is anyways what many/most people use it to mean.
The difference between AGI+ or sub-AGI level capability is therefore as you suggest - whether you can replace a human worker or not, such as all the excitement over Devin-like "AI programmers". Of course human-level AGI would also give you AI middle managers, AI accountants, AI lawyers, etc, etc.
Certainly there are also uses for sub-human level AI for various automation-type jobs, as people are currently discovering and exploring, but there are limits to that. It seems the real/huge economic value is unlocked when AI becomes more than just an automation tool, and can indeed start to replace human labor for cognitive-type jobs - i.e. when human-level AGI is achieved.
I think one of the biggest mistakes is that they just mostly put money into compute. They’d be better off investing in high-quality, human-generated data that covers every metric they’re interested in. That’s both knowledge and domain-specific application. Id go further for multi-modal to license examples of everything a human sees, hears, and does in their life up to college age. Keep mixing that with new architectures to match human performance. Then, lots of spin-offs of textual and multi-modal models with domain-specific, fine tuning like CompSci people are doing.
If they did that, we’d see even larger leaps in their performance. They could also build custom models on this stuff while charging customers for the data mixes they use. Those sales would incentivize more high-quality data to be created.
I'm not surprised. Nowadays I keep 2 LLM tabs open: ChatGPT 4 and Claude 3 Opus. Claude 3 Opus is noticeably smarter than ChatGPT 4. Like, it's not a tossup, it's a clear winner.
I expect the lead to change again soon, but Anthropic is doing a great job.
Opus is big, because it gave people the first taste of what post-GPT-4 capabilities look like. For one, its a major improvement in coding, and a gigantic leap in creative writing abilities, going from high schooler writing to professional prose. Like all the chatbot addicts have once again swapped their addiction to Opus now.
Clearly LLMs are not even close to peaking yet.
Its also sad how awful Gemini is. Gemini 1.0 Ultra is clearly not noticeably better than GPT-4 turbo despite a year's head start. Google is therefore not even going to release Gemini 1.0 Ultra API, instead going back to the oven to train 1.5 Ultra.
reply