We need a lot more memory for desktop GPUs with the need to run AI locally for productivity and gaming. 48gb+ would be the minimum ideal for running 70b+ models. If Nvidia, AMD, and Intel fail to deliver more to desktop users, I can see a lot more people switching to Apple's M3 or later chips with 80-128gb shared memory.
Similar to rewind.ai, I want a 100% offline AI with Vision to run on every file and image I touch. Every file I use can be categorized and available to prompt against.
I really hope that one day I will run into someone involved in Apple Silicon design who will tell me whether UMA was chosen with post-graphics GPU needs like AI in mind.
I don't think you'll find anything in the desktop segment cracking beyond 48 GB in the next 5 years. That said, the M3s with that much shared memory are really in the workstation segment, and priced like it. Macs are actually a great value for this use case all the way up through maxing them out. The inference performance itself is a bit lacking though, but at least you can do it.
There's also the huge caveat of CUDA project compatibility. AMD/Intel seem to be a more promising in that regard, and thats not saying much.
Not that it isn't appealing... If the price is right.
Also, both AMD and Intel are both reportedly coming out with M Pro-like SoCs. AMD's "Strix Halo" is rumored for 2025. Less is known about Intel's Arrow Lake SKU at the moment.
How many people are writing their own CUDA kernels? I think Pytorch compatibility is a bigger win for AMD amd Intel, when solving for the vast majority of AI practitioners who live in Jupyter Notebooks
I don't see the distinction. Tons of popular PyTorch projects run hand made CUDA kernels, whether through a library or one written themselves.
In some cases one can replace rocm/openvino incompatible libraries with pure PyTorch, but the performance hit is often severe, and not many inference projects are using torch.compile or triton kernels to compensate.
The point really is that while deep learning libraries are amazing, at the end of the day they are DSL and really pull towards one specific way of computing and parallelization. It turns out that way of parallelizing is good for deep learning, but not for all things you may want to accelerate. Sometimes (i.e. cases that aren't dominated by large linear algebra) building problem-specific kernels is a major win, and it's over-extrapolating to see ML frameworks do well with GPUs and think that's the only thing that's required. There are many ways to parallelize a code, ML libraries hardcode a very specific way, and it's good for what they are used for but not every problem that can arise.
I think, in general, hardware markets are permanently splitting and the high end side will not stop scaling for at least a decade. From storage to network to compute the needs of the large scale customers grows ever farther from the needs of the consumer. The hyperscaler class are rushing in 400G NICs with hundreds of CPU cores, TBs of RAM, and hundreds of TBs of high speed flash per rack unit asking when they'll be able to make that 800G or 1.6 T and so on. Meanwhile consumers are just warming up to the idea wired and wireless connections can be slightly faster than a gig and their wild gaming desktop can have 16 cores with 64 GB of RAM and 4 TB of flash storage. GPUs aren't really any different - AMD has even had a completely separate computer architecture for years and Nvidia has entire custom heterogenous scale out designs. If you had asked me "do you think it's possible x86 will scale to over 500 cores per rack unit by 2023" in 2013 I would have said "doubt it, Xeon Phi doesn't even look to be on track for people to want to half ass getting to that number". If you were to ask me now how much we'll have scaled in 10 years this time I'd rather now say "not sure, but we probably aren't going to have stopped cold at the 5 year mark because the hype for more scaling died out".
As a result of this split I think consumer hardware will still continue to grow slower. Not because of lack of capability but lack of market interest. People don't have a use case for hundreds of cores or the like, they'd rather smaller, quieter, and more efficient at ever decreasing price points. I'm much more worried investment in consumer GPUs and CPUs for traditional desktops will actually die out and start to stagnate just because the market is getting squeezed ever smaller from both the low end and high end sides.
Interesting. I guess I agree on the idea just not the specific number of 48GB. Amd & Nvidia both offer prosumer GPUs with 48GB, the W7900 and A6000 respectively. I don't see how they would not increase that in 5 years?
A max-specced M3 would give you 128GB of ram vs 48 total from the 2 RTX cards and you'll still need a computer (CPU/MB/RAM/case/fans/etc). They might even out in price but you have more ram with the M3 and a way lower power usage if you care about (I know some people don't), not to mention it's portable.
If all you want to do is AI/ML then yes, a M3 is not your best option but if you want an extremely powerful laptop that can run AI models locally fairly easily (and it's well supported) then the M3 is a great choice. I love being able to download and test out AI models on my M3.
Apple is well supported in that it's "better than everyone except Nvidia", and that's really only because a bunch of well known boutique developers seem to like the high quality of apple laptops and refuse to budge about mainly developing on them.
Still, trying to do significantly fun or interesting things in any major AI ecosystem, such as the Stable Diffusion one through Automatic1111 or with the LLMs in Oobabooga (which supports nearly all LLM backends i.e. llama.cpp), will be mostly crippled and stuff will break in ways that simply don't happen to Nvidia hardware. I'll take the 2 4090s all day, because I know that quantization techniques which shouldn't even be possible (who knows maybethe fabled 1 bit quantization seems inevitable at this point) will make it possible for me to stuff even the most bloated LLM into my measly 48 GBs will be available on Nvidia first and Apple (maybe) second.
I bring this up all the time and get downvoted by apple defenders (usually I’m one). Apples policy of locking out 3rd party gpus might be the cause of a serious decline in Mac OS over the next decade.
That or devs will start building Linux hosts for their gpus at home.
That doesn't really make sense, if anything the importance of having third party cards is diminishing. Apple Silicon is already massively competitive at running things like Blender and Video Compositors. And the trends are towards apple investing in improving gaming and ML workloads (which they've made large strides towards this year), and the relative attractiveness of other GPUs is going down. Almost all of the signs are pointing at apple making their hardware the place to run on device ml.
Mac Studio Pro's with up to 192GB ram are an alternative here too though still on the M2 Ultra chips if you don't need the portability. Not sure how relevant the generation bump is for running the models.
Might be a good hold-over for X years until the consumer hardware catches up and/or the model optimizations make the same hardware perform up to today's DC hardware.
There is nothing stopping AMD from allowing the sale of an affordable 48GB 7900 (or 32GB 7800) right now... In fact they already do in the form of the W7900, for $4500. They just don't sell those at competitive consumer prices, as they are playing the anticompetitive artificial segmentation game right alongside Nvidia, because they somehow think their tiny workstation market is more valuable than seeding AI development on their cards.
Technically speaking, AMD could sell "cheap" 96GB+ GPUs by simply swapping the 7000 series memory controller die. This would take some time and money, but not an eternity + fortune like Nvidia would need to tape out a whole new GPU die.
Intel has no workstation market to lose. I suspect the Arc A770 was kinda late and skimpy for a 32GB "AI" SKU, but the next generation could be very interesting.
I understood "seeding AI development on their cards" as in "making sure AI development in the future will happen on their cards". "ceding" sounds like the opposite of that (i.e. giving up).
The anti competitiveness is from the pricing in the face of competition.
W7900 margins are enormous. In a competitive market, AMD would massively undercut the 48GB RTX A6000 and the 24GB 4090, which Nvidia is price gouging even more. They absolutely can... But they don't.
Separately, its also (IMO) a bone headed business decision.
You have no idea how deep the Nvidia lock-in goes.
AMD only supports small fractions of the deep custom eco-systems built around models such as Stable Diffusion. Good luck getting half of the shit built for ComfyUI or Automatic1111 to run at all, and frequently those features are the killer ones which motivate folks to want to do AI in the first place. There's a reason why Nvidia is a first class citizen among all of the AI clouds. AMD is even behind Intel in this regard, because at least Intel had an existing robust ecosystem of accelerators (i.e. intel optimized sklearn/pytorch, OpenVino, etc)
AMD can be 4x better in every way compared to Nvidia hardware wise but if they can't get developer communities to build around their equivalent to CUDA and its ecosystem than it's not looking rosy for AMDs future.
Which makes it even more stupid that AMD does not at least try to compensate for their shortcoming by selling large memory GPUs, there would be a reason to develop for team red
If there was a 40GB-48GB 7900 SKU, I would have bought it over a 3090 in a heartbeat, even at a big premium over the 24GB card. And I'd be debugging ROCm support in ML projects left and right.
But I am not. The 24GB SKU is just not worth the trouble over a 3090, which is more RAM efficient out of the box. I can't pay $5K for a W7900.
> but if they can't get developer communities to build around their equivalent to CUDA and its ecosystem than it's not looking rosy for AMDs future.
Dec 6th was an announcement from Lisa Su about the MI300x. It also included a full on commitment to ROCm as well as other projects like pyTorch. They are clearly aware of what you're talking about, and working on it. It isn't something that can magically be fixed over night, but it is something that people can complain about on HN over and over and over again...
Some of us have been complaining for close to a decade and there's been multiple promises that have gone no where. It's better to keep complaining until the problem is actually fixed. CUDA is 16 years old and AMD has been "aware" of the issue for at least 10.
Nobody is arguing that AMD didn't get caught with their pants around their ankles or that they have a great reputation with developers. What matters is their statements and more importantly, actions, moving forward.
Right now, they are cranking out ROCm releases on schedule, contributing to a bunch of AI open source projects, listening to crazy people like George Hotz and have released a GPU product that leapfrogs the H100 and is built on OAM-UBB, at 1/3 the FLOPS/$, while also promoting fast ethernet standards.
OpenCL did exist 10 years ago! Rocm has been around for 7 years. Pytorch is 7 years old and had CUDA support on day one and still doesn't have RocM support. Before the LLM craze people still wanted an alternative to nvidia and could see the writing on the wall. There was a bunch of complaining about Rocm and OpenCL didn't deliver. GeoHotz tried to get a RocM backend working and the examples just didn't work.
You can't build a software platform by just willing it into existence. It means sitting down with developers, writing documentation and support users. It means tackling use cases other than llama so that users feel confident in building the next greatest thing on your platform. At it is today, AMD will always be playing catch up and people will continue to push nvidia because the latest and greatest will be on Cuda.
I guess for those of spending money on real gpus for the last 10 years, nvidia was incentivized to take our money but amd wasn't.
Before the current crazy of AI, what else generated large sums of money for GPUs?
Games, rendering and crypto.
If I was a hardware company, that's all that I would have focused on. Certainly AMD could have done better on the software/developer front, nobody is arguing that.
I tend to live in the present and what matters is what they are saying and doing today. You can choose to be pissed off about the past, or you can work towards the future.
This is a pet peeve of mine, but A1111 is not a great example of a ML project. The backend codebase is a mess compared to diffusers, its slow even on Nvidia, its history is filled with controversy (including license breaking and outright piracy in the codebase), the frontend is buggy and unstable. I appreciate how it was a pioneer in 2022-H1 2023, but I am somewhat annoyed its still the de facto SD backend.
Better maintained inference projects are picking up considerable AMD/Intel support. Sometimes even Apple Silicon support.
The only real competitor is ComfyUI, because without the huge ecosystem around your other front end, you lose MASSIVE capabilities, for example, losing this is a big mistake: https://github.com/ljleb/prompt-fusion-extension
At least on Nvidia, A1111 has had near identical performance to ComfyUI on SDXL and SD1.5 for me. Besides that, there were typical minor problems that happen when some guy's hobby project goes viral and suddenly is under massive scrutiny. It was the first "good enough" interface and now there is massive community inertia with extensions, plugins, etc.
ComfyUI has found a place in the image pipeline design niche, but A1111 still seems like the best choice for new users who just want to spin up Stable Diffusion and generate some art for a D&D campaign or whatever.
AMD plays that artificial segmentation game with SR-IOV as well, unfortunately.
Intel is supporting SR-IOV in Linux with the latest Xe GPU driver they are developing, but it isn't clear to me that they'll allow it in the consumer cards. If they do, and if I can run a Linux host and Windows guest simultaneously on one card, I'll be on it like stink on a monkey.
The thing about ML is that memory helps everything.
Take almost any given model that fits in X memory pool for some task. Double that, and everything gets better. Training quality goes up, inference quality potentially goes up, you gain the capacity for more batching which dramatically increases performance. Maybe more caching, depending on the app. You can add more models to the pipeline without much fuss.
why is this? I would have thought that more memory would help up to some point (wherever the entire model can fit in RAM) and then no gains after that. I'd also have thought that things like accuracy wouldnt be affected.
Agreed, and VRAM (GDDR6 at least) is _not_ that expensive. The going rate for an 8 Gbit chip is around $3 [0][1], so around $150 in raw chips for 48 GB.
Unfortunately I suspect VRAM capacities in consumer cards will continue to be limited in order to differentiate enterprise products.
Yes. Realistically you'd just go with higher density RAM chips, which I'm guessing are gonna be more costly per GB as well...
But the upshot for a manufacturer for higher density memory, if the 'failure rate' of fresh parts is the same, well it's better to have less parts of a fixed failure rate ofc.
It wasn't that long ago the price for GDDR6 was up to $16, your 48GB would have cost $800. Or roughly $2000 more in retail price.
What a lot of people dont understand is that RSP dont change with commodities ups and downs. If you price your product $1500 cheaper now due to DRAM cost reduciton, you will have a hard time getting it back up $1500 if things ever resume to normal. So generally speaking RSP comes down very slowly.
The Apple M3 chips are expensive, not so great at gaming, and have significantly slower memory bandwidth than Nvidia chips. (A 3080 has 912 GB/s vs M3's Max being 400 GB/s with most M3 models being much slower).
There are significant compromises in Apple's offerings.
Even so, bandwidth to the CPUs, GPUs, and NPUs are segmented such that none have access to the full 800GB/s. The CPU gets more like 400GB/s in that SKU.
Correct me if I'm wrong but 400GB/s is ludicrous for CPU memory isn't it?
> DDR5 octuples the maximum DIMM capacity from 64 GB to 512 GB.[8][3] DDR5 also has higher frequencies than DDR4, up to 8000 MT/s which translates into 64 GB/s (8000 MT/s * 64-bit width / 8 bits/byte = 64 GB/s) of bandwidth per DIMM.
Depends. DDR5 provides less bandwidth than Apple's on-package DRAM precisely because it allows larger sizes and user upgradeability. Intel has produced similarly packaged CPUs since https://en.wikipedia.org/wiki/Xeon_Phi in 2010, at least. A small amount of very fast memory is great so long as your workload fits within it. The moment it doesn't, everything goes awry. The best memory architecture is always the one which works for your workload. AMD produced a desktop part - the 7800X3D - which provides 2.5TB/s of bandwidth to 96Mb of L3. And an Epyc part with 768mb of L3 in a similar configuration across 8 chiplets.
Then there are issues of prefetching algorithms, cache eviction and concurrency strategies as well as speed and associativity, TLB size, branch prediction algorithm especially as it relates to pipeline length and flushes, decoder width and complexity and execution units available for dispatch.
There's just a lot more that goes into fast than memory bandwidth go brrrrr. Also x86 (and increasingly ARM and Risc-V) have many SKUs above those Apple offers in sleek packaging. Particularly silicon fabbed for high end desktop workstation or datacenter duty is a cut above what's available via other channels. For the cost of a Mac Pro you can easily afford an HP Z series or a whitebox from SuperMicro with dual sockets, 16 channels of DRAM, and enough PCIe lanes to make starving children cry.
Apple's memory configurations for the M3 are confusing and expensive. Charging $400 for an additional 18GB of RAM is ridiculous. If they halved their memory upgrade prices maybe I would consider it, but as-is it's very hard to justify getting memory upgrades when you could just run LLMs on a dedicated Linux box.
Why should a top-of-the-line GPU cost similarly? GPUs in the late 90s were designed by tiny teams, with hardly anything spent on drivers, using fabrication processes which are like paper and glue compared to today’s TSMC processes. Just because an RTX 4090 is also top-of-the-line doesn’t mean there’s an equivalency between it and a Voodoo2, because the effort required is exponentially higher.
In 1910, the most expensive comercially available passenger automobile in the US was arguably produced by Lozier, and would be around $250k today inflation-adjusted. But these days, super cars and hyper cars can go well into the millions of dollars. There is no sense comparing the top 1910 automobile market segment to the top segment in 2023. Likewise, the 1998 GPU market just does not map to that of 2023.
Actually, those million dollar cars aren't really cars, they're collectibles. the real top of the line cars today are probably either a Porsche GT2 RS, an Aventador SVJ, or Cullinan if you only want comfort. And all of those only run about $300k-$500k
PC parts were generally quite expensive in the 90s. Most prices have fallen over time for all components when taking inflation into the equation (as well as absolute dollars, see storage/RAM), except GPUs.
> Intel’s popular 33 Mhz 486 CPU cost PC makers $1,056 in 1990 in quantities of 1,000.[0]
Let's not make this into a car comparison thread -- they're never valid comparisons.
People throw "adjusted for inflation" around as-if it makes any sense to adjust the price of concrete or GPUs for the price development of bread flour.
I find it strange that price shouldn't have gone down. They are making lot more units now, so shouldn't there be significant gains from scale? And production efficiency should also be increasing.
Of the latest generation Intel has a model you can grab new for $99, AMD $270, and Nvidia $280. I think that covers low and mid range pretty well, even filtering to only the absolute latest generation.
The big pinch on low/mid range for me was during the shortage period but that's long gone.
The cards at those prices are quite scrawny, and arguably poor value for the performance. Even accounting for inflation, ~$275 bought you physically beefy GPUs in previous generations.
I've heard this a lot but I've never seen convincing reasoning, outside the GPU shortage era. It's also a bit laughable to call the 4060 scrawny by any means, but hey - it's not a 4090 if that's what we're marking expectations at. I'll use launch MSRP just to make comparison a bit quicker and this will raise the 4060 to $300 instead of the current $280.
Taking that down to 2010 dollars that'd give you ~$200, exactly what you need for the low end sub-model of the shiny new GTX 460. The beefy pairing of that would be the 480 at $500, or 2.5x the price for roughly 1.6x performance. The 4060 vs the 4080 has a little more than twice that performance gap at around 2.4x performance difference or 1.5x more a performance gap but the price difference is also larger with $280 vs $1200, or than 4x the price or 1.6x the price gap. In all, the gaps themselves aren't significantly different but the price of the highest tier card is what has changed. Does that make the lower tier SKU inherently scrawny and poor value for the performance? I don't think it has any relation really, but if it did I'd say it would make more sense it was the opposite.
Looking at absolute numbers one sees, just as expected, the 4060 is right in the top 3 for performance per dollar https://www.videocardbenchmark.net/gpu_value.html. This makes is hard to argue the performance took a huge dip either. It's not much faster than the same card in the previous generation, but it's also not much more expensive. Over many generations it's significantly advanced in both regards. Really the only card that was significantly faster than it's direct predecessor this generation was the 4090 and its pricing gives even more credence to the idea it's the price of the top SKU cards rising not the performance per dollar of the low tier SKUs rising that leads to the increased laddering. The other big change is you don't see cards like the 1030 coming out at launch. It's not as much that the market has stopped being served as much as integrated GPUs have come up to that level of task and anybody getting a dedicated wants something more not something the same.
The RTX 4060 is less than half the size with half the bus width.
I know there are other factors, like silicon getting more expensive in general and L2 cache getting bigger, but still, the 4060 corresponds to a much "lower end" GPU in Nvidia's lineup of GPU dies.
If what matter about was physical metal area per dollar I'll be glad to sell you a hand wound 1kb of RAM for $20,000 then. It's absolutely massive in comparison to 64 GB DDR5 sticks so seems to deserve a premium as well eh? Heck if someone offered me a CPU that performed twice as well at the same price and used half the die are you going to tell them it's scrawny and poor value for performance simply because that CPU isn't as "beefy"?
The real metric for sizing silicon is transistors per dollar, not area used. Area is only used in how the fab will price a specific process and cannot be compared between process generations. I.e. it's as one factor in the overall price of among the current generation. If a billion comes out to 70 mm^2 or 700 mm^2 for the same price what difference does it make to the consumer, especially when the denser process will run the same given less power? The hard part about making good chips is, after all, not using the largest amount of metal in them rather the amount of logic and relative performance of said logic vs other implementations. Ultimately this leads to perf/dollar.
Even the lower of "high-end" is starting to look awfully pricy and don't have reasonable future proof memory amount. I might be interested in reasonable GPU for gaming in 600-700€ range, already up from previous 500, but it should have 16GB of ram. As 8GB is seemingly low already in some games...
I sincerely think consumer GPUs are going the way of the sound card: Integrated into the CPU for most price brackets, with discrete options remaining only for those with specific needs (professional/enterprise) and people who don't need to ask how much something costs.
If LLMs drive GPUs manufacturers to add more memory, will that mean game developers will create ever-larger textures to fill up that memory? Will we see multi-terabyte game downloads or streaming textures from a CDN?
Alternatively: What would a AAA game be able to do differently if it had access to a GPU with 48GB or 96GB of fast memory?
NPC and world simulations would become far more dynamic. Imagine what The Sims 2025 could be like using that kind of hardware. I could imagine that gamedev AIs will be able to add weather and seasonal effects without much work from the game developers.
Intel would be fucking crazy to give up on GPUs again, given the rise of AI processing workloads. I know a lot of companies cut their own throats chasing quarterly results but apparently Intel have a real engineer leading the ship right now.
Or maybe the whole AI bubble is about to deflate as the inevitable copyright battles begin, and as more people turn against a technology that threatens to automate away interesting/creative jobs rather than mundane and repetitive ones...
Becuase the AI cycle is still in the hype phase, and VCs and private capital is getting in right now before unloading onto retail investors to hold the bags.
And also, like every major tech invention in the past 40 years (smartphones, search, operating systems, social media, e-commerce), AI will quickly be dominated by a single corporation that will hold 80%+ of the market share. Nvidia on the hardware side and Microsoft/ClosedAI on the software side.
> Becuase the AI cycle is still in the hype phase, and VCs and private capital is getting in right now before unloading onto retail investors to hold the bags.
I don't think so. Had this been the early SPAC days, we would have seen a lot of AI companies try that route right away. SEC has since cracked down on SPACs making them a lot harder to do. So, I don't think we are going to see the same runup of companies trying to go public that we did 25 years ago since there isn't as easy of an avenue to do so now.
As for the domination, I also don't believe that either. I don't think that there has been a single company that owns each of those categories that you list. There are many large players.
MI300x is 1/3 the FLOPS/$ than H100's and offers more ram. The hardware AI companies will leapfrog each other on each new release, and the software ones will be extremely competitive with each other thanks to the ability to just rent the high end hardware to do their training/inference.
The copyright battles affect models that were trained with datasets where the copyright holder's approval is missing. Copyright law will be adapted to consider that usage, and model trainers will make deals with large copyright holders.
It is very rare for any technology to be completely abandoned. Usually it only happens if is being obsoleted by a more capable successor. AI is here to stay, even if a minor cooldown is coming as LLMs get somewhat regulated.
> The copyright battles affect models that were trained with datasets where the copyright holder's approval is missing.
Which is likely to be most/all of the high-profile text and image generation stuff. Where do you get a vast and varied enough library of non-copyrighted text and images on a vast range of subjects? (Even if you rely on something like owning a social media platform, how do you filter out copyright-infringing content that users will inevitably post?)
More specialised datasets for specific tasks can be gathered safely, but, with the exception of self-driving vehicles, that's not really where the AI hype bubble is. It's all about ChatGPT, AI art, and the push for for AGI.
I agree that high-quality datasets, usually proprietary, are key to good model performance. Microsoft's Phi-2 model for example is punching way above its weight thanks to being fed high-quality textbooks and encyclopedias. But that's not a total showstopper. To use those, deals can be arranged since many textbooks come from just a few publishing houses. Heavyweights like Microsoft can do that. And a comparatively small model might actually be less likely to reproduce content versatim even if prompted.
I am more curious whether researchers and private persons will be permitted to continue uploading models on Huggingface.
While I'm not as happy as anyone that consumer cards are "stuck" at 24GB I have been pleased with all of the software work that has been done to make the most of these cards.
Quantization is the obvious one but you're also starting to see interesting approaches to further this (too many to list/name).
As is often said "necessity is the mother of invention".
Where this gets really interesting is when (hopefully not if) higher VRAM "consumer" cards become available we'll be able to do even more with the additional memory.
Artificial market segmentation still sucks but it has led to some great software work that will continue to reap benefits for years to come.
I don't even know what people are really asking for anymore. 4070, 7800XT, 4060, and 7600 were all incredibly high-value launches, and in January we're incredibly likely to get price drops on 12GB cards as well as 16GB cards at the old 4070 Ti price point. The value is there, just a fairly large faction of people have an emotional investment in pretending like it's perpetual doom-and-gloom.
Consider the reaction when Microcenter started putting a $100 steam gift card on the already-attractive 4070 price. People still flipped a shit even though a de-facto $500 price was obviously about as good as any 4070-tier product would get for this gen (apart from the firesales at the end).
This is pretty much what super refresh is going to accomplish next month (a year later!) - 4070 will probably move down to $499-549, 4070 Super will be 95% of a 4070 Ti for $599-629, and 4070 Ti Super will offer 16GB at the same $749 price point. But people want a return to 2000s-level pricing (despite soaring wafer costs etc) and nothing short of that is going to satisfy some people.
(and a fair number of people (including reviewers!) actively propagandize about these cards, like complaining the 4060 or 4060 Ti is actually slower than its predecessor - they're absolutely not, even at 4K (tough for a 6600XT tier card!) the 4060 Ti is still faster on average. And as a general statement, AMD made the same switch to 128b memory bus last gen already without catastrophic issues, like the aforementioned 6600XT. A card can still be a mediocre step without making shit up about it, and the 3060 Ti was the absolute peak of the 30-series value so it's understandable the 40-series struggles to drastically outpace it. But it's not slower, either.)
Since the RTX launch there has been the rise of what I'd call the "socially gamer, fiscally conservative" faction, the group of people that likes to think they'd be gaming, or likes to think they would upgrade, but this spending is too gosh-darn wasteful! However good the deals get, it's much more fun to just post dismissively on the internet about it, and it costs nothing to do so, but you get all the social clout of aligning yourself with the thing your in-group is excited about and signaling your social virtue (thrift), etc.
Others are elder gamers who have lives and families now and are just drifting away in general and can't justify the money on a hobby they really no longer do, but they can't just let go of that identity/touchstone from their youth. And they're subconsciously looking for a deal that's so out-of-band good that they won't feel bad about not using it.
But again, like the 4060 and 4070, there are plenty of deals at all times and in all price brackets, including older cards if that's your jam. 3080 and 3090 are good, 6800XT/7800XT is good, 4060 and 4070 are good, 7900XT has fallen as low as $700, etc. If you need to upgrade, then upgrade, but the slowing of moore's law has hit GPUs hardest. They are asymptotic perf-per-area machines that thrive on scaling out on cheap transistors (jensen has explicitly talked about this more than a decade ago) and now that cost increases are eating up most of the density gains it's understandable that gpus are the first and most affected by slowing gains/rising costs.
At some point if people don't want to bite on deals like the microcenter thing, you will just see companies pivot to the products that are actually moving. 4090 did great, AI is doing great. Midrange and high-end will continue to sell and the "culturally gamer, fiscally conservative" folks will continue to seethe about the lack of a true Radeon 7850 successor or whatever. Everyone understands why the $150 price point is dead, but people don't want to accept that the cancer is going to continue to progress upwards, because fab tech isn't getting healthier either. And the perf/$ progress that AMD/NVIDIA can make is fundamentally determined by the transistors/$ that the fabs give them. Going above that rate of trx/$ gain is long-term unsustainable for them.
If you can't be brought over by reasonable deals, like 4070 for $500 or Super Series refresh, are you really addressable market, from the perspective of NVIDIA and AMD? and in AMD's case, they benefit if you buy a PS5 anyway, that's still an AMD product, but doesn't carry the baggage of having to argue with the tech community (and big tech reviewers who increasingly view themselves as activists/participants rather than simply "calling balls and strikes") about the merits of cache vs bus width, or whether DLSS or RT or framegen is a good idea, or whatever. Sony will make a decision and you will take it or leave it, and seemingly that is more palatable than what AMD and NVIDIA are doing.
Unified APUs are great, and have huge cost advantages, but it does involve many of the same compromises that gamers already hate. You're not going to be upgrading the memory of your Steam Console, and Series X/S both have GTX 970-style "3.5GB" slow-memory partitions, and there is a huge focus on upscaling (which will probably include AI upscaling in the PS5 Pro), etc. Those things are simply ground-truths in the console world. A huge part of the "console experience" is simply removing the option to even argue about these compromises, so that people don't feel bad about not being able to use ultra settings or having to turn on upscaling to do so, or for using a CPU with less cache/slightly lower IPC, etc. "Compromise pain" is quite similar to "choice paralysis" in practice, and by taking the choice away you relieve the pain.
They want the latest and greatest, ( Leading Node, Latest Software Development and Driver Update improvement ) while not willing to pay much for it. At the same time refuse to learn and understand why something cost as much. And worst case of all? They are also the ones who traded cryptos. The major reason why Nvidia GPU prices had been above average.
I believe ETH and it's offshoots (e.g. DOGE) didn't have major benefits for ASICs, or at the very least ASICs didn't 'blow away' GPUs the way they did for BTC.
idk about miners being primarily the ones who complain about prices, but honestly they might be a large number of the people who complain about memory bus reductions.
RDNA2 was a relatively poor miner (worse than RDNA1) precisely because RDNA2 already did the same thing as Ada, replaced real bandwidth with cache. And generally that's fine and it performs well, because games are a pretty normal workload. But mining is deliberately completely random and uncacheable (otherwise you could cache the parts of the DAG that you need) and benefits more from the raw bandwidth, because the cache does ~nothing.
Miners have always been fucking great at astroturfing, like in 2021-2022 they had a ton of people convinced that it was just some giant production shortage and NVIDIA just wasn't making any cards etc. Or the arguments about how the mining LHR limiter was "bad for gamers" because you wouldn't be able to mine as profitably in your spare time (of course ignoring that paying double MSRP in the first place kinda offsets that and in fact necessitates mining to even make the purchase make sense). Miners are fundamentally just nerds who post on tech forums and gaming forums and know what arguments are convincing and catch traction etc.
Can I see those guys making a particular stink about the last source of super-wide memory bus cards drying up? Yes. They already bought up all the 5700XTs etc, Ampere was the last hurrah of the Big Memory Bus, and actually it would make sense if a ton of the outright agitators had some history in mining subs/etc.
It shouldn't really matter to gamers where the bandwidth comes from. Yes, cache hit rates decline at higher resolutions, but in reasonable configurations (eg not 4060 at 4K) the cache systems on RDNA2 and Ada are basically seamless/transparent to gamers. Games are cacheable workloads. It's fine. Things like the yuzu dev running 16K downscaled to 4K and complaining he had some performance loss on a 4060 is the extreme exception. They're not transparent to miners though.
In accordance with hanlon's law, however, I will admit that gamers are some extremely dumb motherfuckers (ahem, "often lack understanding of relevant technical details and frequently tend to bandwagon even when it doesn't make sense" etc) and it could just be people getting riled up about stuff they don't fully understand from semi-trusted tech-media figures telling them to be upset. We went through this same "it shouldn't matter how the bandwidth gets there" thing with texture delta compression, there were people insisting for ages that it was bad and hurt image quality despite it being lossless, and that AMD's image quality was better despite them implementing the same lossless delta compression approaches in GCN3 (Tonga/Fiji). It's more bits delivered per bit, without even a real power usage hit (unlike GDDR6X which was an incredibly mixed bag). Or with Fiji's HBM and Vega's HBM2/HBCC supposedly being some magic thing that let you swap across a 16 GB/s pcie bus without any sort of framebuffer size limitation etc... the gaming community is real bad at critical understanding and thinking about some of this deeper tech stuff.
(Reviewers, on the other hand, there's really no excuse, and a lot of them are playing silly buggers largely at the expense of their viewers. People getting surprised by Alan Wake 2 that DX12U/DX12.2 is a thing that matters and is being adopted, people getting surprised that upscalers are a thing that matters and will matter more as UE5/6 and other intensive games come to rely on them, people getting surprised that mesh shaders reduce VRAM utilization, etc. They have basically done nothing to inform their viewers about any of the broader developments in the field other than whining about VRAM. It really does feel like a lot of the last 5 years has been a bunch of the reviewers being activists about the graphics field not going the way they want it to: they don't want upscaling, they want a return to moore's law and real raster gains etc, even though that's not possible and they understand perfectly well why. The trajectory of, eg, Steve Gamersnexus has really been something else, contrast his take on DX12 with his take on the RTX launch and DX12U/DX12.2. And pretty much everyone, too, continues to drag their feet on admitting DLSS is real framerate improvement etc that actually matters to card performance and efficiency etc. Reviewers discussed visual quality/pipeline quirks as part of GPU reviews for ages and it is increasingly going to be necessary again as more of this stuff is shunted into brand-specific platform libraries like DLSS, Metal, XeSS, and FSR. Doesn't matter if they don't wanna, it's necessary. It's very aggravating, for someone who wants to see the tech discussed and move forward.)
Apples to apples, the 4060 is under $300 street price vs the 3060 Ti usually riding at $450 absolute minimum, almost as fast , far more efficient, supports newer DLSS features and AV1 encoding, etc. Samsung 8 was an incredibly cheap node and TSMC 4nm is incredibly expensive, so Ada was starting from a disadvantage in terms of perf/$ step, and also 3060 Ti was the absolute king of the "mainstream" value lineup (again, to such an extent that it essentially never sold at MSRP).
How do you cut 1/3 of the product cost in that situation, while only losing 8.7% performance? Some heavy-duty cost-downs. Obviously NVIDIA wasn't going to continue shipping 400-450mm2 of silicon in the x60 tier once they moved back to a modern node (and people have forgotten the die sizes involved with leading node products like 10-series or 600 series, they are quite small). And since PHYs don't really shrink, and since TSMC 7nm and 5nm have really good SRAM density, the obvious strategy is to replace raw memory bandwidth with cache. Which is the same thing AMD did with Infinity Cache in the previous generation. PCIe bus has the same problem, which is why AMD was first to the party with "mainstream" cards like 5500XT and 6600XT with cut-down x8 pcie bus as well.
These compromises are going to become increasingly heavy at the bottom of the stack. Remember, it's not like $150 products don't exist anymore, but they're going to look less like a Radeon 7850 and more like a RX 6500 XT. First they start having "entry-level" characteristics like 4060/7600 currently are, then they become HTPC tier stuff like 6500XT. They cease being products that will interest enthusiasts. Smaller memory buses, VRAM won't keep page with the needs of gaming, poor perf/$ for gaming, or the double-VRAM variants (like 4060 Ti!) will have good VRAM but atrocious value. There rarely is a moment where things actually roll backwards, but prices won't continue to approach $0 infinitely either, it follows a pattern of stagnating performance/VRAM in the face of increasing gaming requirements, and increasing amounts of compromises to get the cost down further.
With how slow modern node progression has gotten, and how many compromises are involved with newer designs, the older nodes remain very very favorable especially for lower-end designs. A bigger design on an older node means you don't have to compromise the memory bus and PCIe bus, and there are definitely customers for whom a 3060 Ti is a better product. AMD leaving the 7600 (another 128b product!) behind on an older-node is (once again) a very forward-thinking cost-reduction attempt, just like their work on reducing memory buses and pcie buses, and frankly it's surprising that NVIDIA isn't continuing to churn out 3060 Ti for the lan cafe type markets etc. 1080 Ti performance in an 8GB card is very adequate at the right price, and inventory stockpiles aside the 3060 Ti is much more attractive than the regular 3060 let alone the weird cutdowns.
If you don't want to make the compromises, you have to follow your product-design-bracket upwards in price. The price for a current-gen mainstream card in terms of the design expectations of mainstream gaming customers is $500-600 now.
And that's the part I think everyone just struggles with. $250-300 is entry-level cards now. The 4060 is perfectly fine in the context of an entry-level design that makes compromises to hit entry-level price points. $500-750 is mainstream where you are getting no-compromises on featureset/VRAM and balanced performance with good overall value. $1200-2000 is the enthusiast range where you get to check all the boxes and have the top-tier card that does RT everything with 24GB of VRAM etc. If you don't follow the design brackets upwards over time, then you are stepping downwards in design tiers from enthusiast to mainstream to entry-level, and you gradually get a more and more compromised product. 7850 becomes 6500XT.
At the end of the day it's already a low-margin, wafer-intensive business with a lot of supply risk (holy shit imagine trying to balance inventory through mining). NVIDIA has enterprise/AI, and AMD has constant supply issues in mobile and wants to take ground in datacenter. There is no real necessity for them to do this as a charity when they could be making more money elsewhere with their supply constraints, and it's actively bad for them to set up these crazy high-value products like 3060 Ti where it ruins the sales of the next 2 gens and breeds bad blood with customers. Arguably it's even bad to be doing the firesales to clear old inventory (as AMD found out with some reviewers panning RDNA3 too). If node progression and gen progression is slowing down, you don't need to firesale it to get it gone, it's not going to be e-waste in another year like in the 2000s.
AMD and NVIDIA are gonna make the products they can afford to make, sustainably, and they're not gonna cut their throat anymore with crazy deals like 970 or 3060 Ti/3080. If some segments are eroding because they're no longer viable gaming products, or because they're being eaten by APUs (and I'm including consoles as GDDR-based APUs) then fine, they won't make those products in the future. The $200-300 segment as a gaming gpu segment is not going to be forever, I expect the 8500XT/5050 and 9500/6030 will be quite boring for gamers just like 6500XT/3050 is.
> (and a fair number of people (including reviewers!) actively propagandize about these cards, like complaining the 4060 or 4060 Ti is actually slower than its predecessor - they're absolutely not, even at 4K (tough for a 6600XT tier card!) the 4060 Ti is still faster on average. And as a general statement, AMD made the same switch to 128b memory bus last gen already without catastrophic issues, like the aforementioned 6600XT. A card can still be a mediocre step without making shit up about it, and the 3060 Ti was the absolute peak of the 30-series value so it's understandable the 40-series struggles to drastically outpace it. But it's not slower, either.)
Yeah, the 3060 Ti was probably the best 30-series card, but the 4060 Ti being 128 bit is a much bigger issue than the 6600 XT or 4060 being 128 bit. You see, the 6600 XT succeeded the 192-bit 5600 XT, and the 4060 succeeded the also 192-bit 3060. But the 3060 Ti was a 256 bit card.
I'm also pretty sure that the 3070, which is only a small step up from the 3060 Ti, often outpacing the 4060 Ti and it becomes a much worse thing, also considering that the 3060 Ti and 4060 Ti both launched for $400. (The 6600 XT also launched for an MSRP of $380, but this was amidst the huge pricing bubble and AMD didn't want to hand over all of that to scalpers.)
To add insult to injury, the 4060 Ti is available in a 16 GB variant, which has so much memory relative to its bit width that Nvidia has to put the chips both front and back. GA104 cards which wouldn't need such tricks did not get one.
The 4070 and 7800 XT are fine from a value perspective, but it's above the price range that most people are in. But the value proposition drops off quite hard there. GPUs like the 7600 or 4060 Ti offer almost nothing over their predecessor in terms of gaming performance, so it makes sense why nobody buys them. But those are the segments that are supposed to sell well.
I agree, 128b is too small for the 4060 Ti's price point. Both AD106 and AD103 really needed another 2 memory controllers per die, that would have given you 4060 Ti 12GB/24GB and 4080 16GB/4080 super 20gb (with quadros using clamshell). They missed the market expectations on VRAM at the price (and performance) levels they wanted to target, and it probably would only have increased the die size by 10% or so to have the extra memory controllers.
My suspicion is that these are pandemic-brain decisions. I think Ada entered mass production at the start of 2022 absolute latest (remember the "TSMC refuses to cancel 5nm wafer order" headlines?) so it would have been specced early/mid 2021 most likely. It's about a year from tapeout to launch typically, and if rumors were true they were ready to launch in june 2022 then they would have taped out in mid-2021 and been in volume late 2021/early 2022.
Everyone in early 2021 was trying to absolutely maximize the number of units shipped, and if you need 4GB more memory per card then you ship 33% less units for your GDDR supply, you get more than 10% less dies per wafer, etc. It would have seemed like a good decision at the time especially with mesh shaders eventually coming in and reducing a bit of the VRAM pressure etc.
As it stands, short of taping out some new dies, NVIDIA really cannot do anything about some of the gaps in the market because their only option is insane cutdowns on AD102 and AD104 to fit the gap. Cutting AD102 down to (rumored) 4080 Super shader count would be a 40% cut. Cutting AD104 down to 4060 Ti shader count would be a 50% cut. That is not yield harvesting, that's throwing away half your die. And that's why a 4080 Super 20GB was absolutely never in the cards once AI started to really take off - NVIDIA can already sell those as Quadro A5000 or A6000 anyway, why would they throw away half of a working chip to make it cheaper for gamers? 4090D is absolutely the final nail in that coffin, it will not happen at this point, every shitty yield they have will go there.
AMD themselves have gone a different route and disambiguated memory from compute. They already put 2 memory PHYs per MCD attached to a single infinity link (so they are "doubling up") which relieves the pressure of GDDR density stalling out at 16gbit for a prolonged (I think unexpectedly) long period of time. It also provides a degree of physical fanout which relieves the routing pressure etc (look at the footprint of an A6000 or RTX 3090, it's crazy how tight they rammed the chips in).
It surprises me that AMD haven't done a 4-PHY MCD. Even if you can't route that many GDDR lanes, you could also do sort of the opposite of the 7900GRE/7900M and put a smaller chip in a larger package but it commands the full memory bus width of a 7900XTX, so you have Radeon Pro 7800 48GB or whatever. It's not without its downsides (RDNA3 idle and low-load power consumption remains atrocious and probably always will due to the link power and due to the caches being on the other side of the link) but they're accidentally sitting on the key to big-VRAM cards at the moment of big-LLM models. What could be more of a push for ROCm then "hey here's a Radeon Pro W7800 48GB for $1499 and we can actually ship them today"?
Yeah, I agree fully on the "pandemic brain" thing. One interesting observation I made is that usually mainstream midrange cards seem to hover just over 200mm². That includes for example cards like the HD 7850, but extends all the way to the likes of the RX 580, RX 5700 XT, but also the GTX 1060 or GTX 660. Those cards are all 256 bit if AMD and 192 bit otherwise, funnily enough.
It seems die sizes are on the rise though! The 6600 XT is a similar size to the 5700 XT despite having much less hardware (except cache), although I guess RDNA2's better clocks also ate some die space. The RTX 3060 is also reasonable large at 276 mm², which seems more like what's usually 60 Ti territory. And the 3070 is even bigger than the 1080!
Even in that context though, the 4060 Ti looks fairly anemic at 188 mm². The 4060 is even smaller at 159 mm², which almost equals the RX 5500 XT that debuted at $170, just over half the cost of the 4060.
Intel's current GPUs are meanwhile comically large seeing as the 400 mm² or so ACM-G10 competes with N23/N33 and GA106 which are much smaller, and even the tiny 4060. I think that's mostly because of the way too fine grained EUs which also makes the architecture very complex with slices and subslices. But at least for Battlemage they seem to be planning to grow the EUs, rather than throwing on more.
It's also worth noting that if Nvidia decided to "double up" the memory on the 3070 in a similar way to the 4060 Ti, they could have made a 32 GB version of it. Intel could also decide to do something like that for Battlemage, or of course AMD could do it with the 7900 XTX or similar. Despite AMDs weaker AI performance (no real matrix multiplication acceleration, unlike Nvidia or Intel) it would at least get AI developers to likely care about their cards as it means they have the best flagship in one important metric.
With the way wafer costs keep increasing, physically smaller dies are inevitable. 50% higher density at 30% higher cost or whatever implies that if you keep the same die size, then cost goes up 30%, and in market terms that die has moved up a product tier. That number I was referring to is total transistor count and transistors-per-$ not die area, the cost of producing the same sized (eg 200mm2) die across nodes has skyrocketed, a 4060 Ti is far far more expensive than a 1060 to produce.
Clamshell also adds PCB/assembly cost which people never account for. And all of this has to have partner margins rolled into it too etc. Eg at one point it was like $3.50 per 8gbit module, assuming 16gbit modules are comparable then 8GB x $4 = $32 of memory, but probably there's another $10-15 in the PCB and assembly, plus partner margin, etc. $50 would be partners losing margin, $75 probably they break even, $100 definitely is pushing up margin a bit, but that's also what people wanted, if you remember back to the bandwagon around EVGA. I think in the rush to bandwagon everyone just forgot who ultimately pays that margin ;)
(and of course in launch terms, once cards are manufactured it's painful to open them up and remanufacture them all, are you going to desolder the BGA to put it on a new PCB, one by one?)
I've commented the same thing before, that 3060 Ti 32GB would also be a very appealing product at the moment. I guess they have their reasons but it seems like a slam-dunk cashgrab at this moment of AI mania?
RDNA3 does have WMMA which should at least get the foot in the door for things like AI/ML(-weighted TAAU) upscaling. I expect them to take a crack at that with FSR4, I strongly discount rumors that they're not working on that, especially with the console refresh (PS5 Pro at least) having RDNA3 and some kind of AI core. Even if sony builds one in-house, AMD totally have to be working on one internally too, just there's various reasons not to talk about it externally etc (people wanted them to not announce overly early etc, this is what that looks like). They'd want something for devs who want to validate once across multiple platforms anyway, they definitely will launch something within a year if not alongside the PS5P itself. WMMA may not be the answer for pure training horsepower though, NVIDIA having full-fat tensor units (like CDNA's) in their gaming GPUs was a good move (I won't even say "lucky", they've worked for it). But RDNA3 WMMA is better than nothing.
I really think Intel's architecture is aimed at a future-node (plus an attempt to get the foot in the door on game/driver support). wave-8 is indeed way too finely grained for most workloads right now, but the pendulum is swinging back from cache to logic, and that means you have more transistors to spend on such frivolities. Growing the EUs definitely makes sense to me, like they were proofing the control/scheduling side at a small scale first, now they build the rest of it, etc. It just feels obviously wrong to do wave-8 right of the gate, on a 3060/3070 tier product at best, so what was the real goal? Some kind of incremental/iterative development, presumably.
But yeah intel is a can of worms. The last time I checked (probably 6mo ago) they were operating the graphics division at -200% margin, which is needless to say stunning. I just also think they really have no choice, the writing is on the wall with NVIDIA Grace and MI300X and Apple Silicon Max/Ultra that big APUs are the future, both for a number of consumer segments but especially for a lot of HPC and even enterprise segments. Can't be competitive in HPC or enterprise if you don't have it, and even for laptops, you can't not have an iGPU. So if you cancel it do you go to Imagination or someone and license PowerVR like the bad old days of Atom? That's not a compelling product when AMD is turning out chips like 7840HS and soon the Strix Point/Strix Halo line.
Other people tend to read that as "-200% margin, they'll cancel it any day now" but I actually read it as the opposite, it's "-200% margin and it's so important they are barreling through anyway". They are canceling stuff left and right but they don't have a choice on Xe/Arc regardless of the cost. And specific dies being canceled based on the current stuff doesn't mean anything in that context - that's just staying agile rather than over-committing to a schedule before you know what product you're going to build.
But in general Intel can't seem to execute much of anything very well these days. Every good release feels like a fluke and they're back to the woodshed in no time. Alder Lake came with the loss of AVX-512, the 2.5gbe NICs are stuck in a state of permanent re-spins and steppings, Sapphire Rapids has some fatal power flaw that needs a 1200W psu (and they're not kidding) due to silly 700W+ transients and is getting a respin just in time for emerald rapids, meteor lake was basically 6mo late and still had a messed-up BIOS and still undershot performance targets, on and on, and they're cutting pay and firing staff. Death-spiral territory right there.
> RDNA3 does have WMMA which should at least get the foot in the door for things like AI/ML(-weighted TAAU) upscaling.
Do keep in mind that RDNA3 WMMA is very slow, running at the same theoretical TFLOPS as shader (which are doubled from RDNA2 due to very limited dual issue support that can be used in WMMA). Nvidia tensor cores and Intel XMX can run closer to 4:1 or 8:1 or so compared to vector workloads.
> It just feels obviously wrong to do wave-8 right of the gate
That's because Alchemist isn't a a true first generation product, it's a scaled-up version of Intel's (relatively mature at this point) integrated graphics product. This means that Alchemist has suffered large growing pains (Gen12 was not really designed to be used in products bigger than maybe 128 EUs, A770 is 512 EUs). It also has some form of separation anxiety, seeing its need for ReBAR. I also recall very early in Alchemist's life (pre-release) some driver optimization that had a huge benefit, which was just the wrong memory region being used, as all memory is the same on an iGPU but not a dGPU.
> It also has some form of separation anxiety, seeing its need for ReBAR. I also recall very early in Alchemist's life (pre-release) some driver optimization that had a huge benefit, which was just the wrong memory region being used, as all memory is the same on an iGPU but not a dGPU.
yes, the iGPU is actually a client of the ringbus on intel so some of these bugs seem to have been overlooked (I'd guess the fix probably improved performance on the iGPUs too lol). AMD has always interfaced the iGPU via PCIe[0] which probably helped modularity.
Plus in general AMD just seems to be better at modularity and re-use period. I think that's the biggest headwind for Intel in general. Every. single. product. is completely one-off and custom and has its own set of bugs. Just define an interface and get used to it.
But I think that's a Conway's Law situation of the hardware design resembling the org structure. Intel is a mess inside and so are their products.
[0] Infinity Fabric is de-facto coherent PCIe fabric, the intel equivalent would be putting it over DMI. And AMD explicitly offers IF as a CXL competitor too. I do love that in the modern era everything is PCIe, the many-faced god. And it all just works - plug your OCP 2.0 card or M.2 card into an adapter and away you go, or tunnel pcie over Oculink or MCIO, etc. The greatest tech success story of the last 30 years.
> One interesting observation I made is that usually mainstream midrange cards seem to hover just over 200mm². That includes for example cards like the HD 7850, but extends all the way to the likes of the RX 580, RX 5700 XT, but also the GTX 1060 or GTX 660. Those cards are all 256 bit if AMD and 192 bit otherwise, funnily enough.
this is indeed an interesting observation btw, and I have mused before that it's interesting how NVIDIA leans towards narrower buses with more advanced tech while AMD tends to lean towards plainer wider buses. Which is sort of the same observation but from the other direction.
by that I mean, they were first to lean into (lossless) delta compression, and continued to retain an advantage in compression ratio for most subsequent generations. They were first to lean into quad-data rate with GDDR5X, first to lean into PAM4 with GDDR6X, etc. They very clearly favor narrower buses with higher "intensity" for their high-end stuff. Supposedly this reduces the power-per-bit-transfered but man looking at GDDR6X I really don't know, 3070 Ti is massively worse at efficiency than the 3070. Evidently they don't clock down the memory very well and at full transfer rate the total power is significantly higher (even if the per-bit is less).
I wonder when we'll see the return of consumer HBM cards. It feels like the time has to be approaching soon, especially with high-NA reticle limit hitting in a gen or two. Memory PHYs are an obvious thing to cut back on for NVIDIA in particular, since they don't have MCM yet.
the analogy also applies to high-end where NVIDIA is usually topping out at 384 and AMD has gone as high as 512b before. I think this is no longer possible with GDDR6/6X due to signal integrity/routing and the pcie card dimensions, everyone is topping out at 384b now, even AMD.
Super cards are allegedly going to be only 10% faster than non-super cards, at similar pricing #. Not significant enough to change the argument of the article I'd suggest.
pre-release scalping/placeholder prices have been wrong almost 100% of the time they're brought up. they're absolute staples of the twitter leaker crowd who has absolutely nothing better to do, but they have been wrong almost every time. Literally there were rumors of intel MSRP adjustments for every series since 7th gen, 8th/9th/10th/11th/12th/13th have all had dumb "i7 is now $449! i9 is now $749!" type headlines every single time. Some random hole-in-the-wall computer shop in singapore does not have secret pre-launch info, it's just a placeholder/safely-high scalping price because they know it'll sell out on launch day anyway.
What I expect is:
* 4080 super won't be much of a performance step, but it'll probably move downwards from $1200 to $999. Could even be $899.
* 4070 Ti will be retired and replaced with 4070 Ti Super at the same MSRP. Not much faster, but it'll go from 12GB to 16GB, so NVIDIA will finally have an midrange 16GB option. Upper-midrange to be sure but it's not $1200 either.
* 4070 will likely move down to the $499 price point and 4070 Super takes the $599 or $629 price point. This should be ~95% of a 4070 Ti 12GB at 80% of the price.
It's possible that 4070/4070 Super are a little bit more, the viable range is like $499-549 and $599-649 respectively, if 4070 Ti Super comes in at (eg) $799 and 4080 Super at $999. But in general this whole refresh is strongly constrained by the 4070 Ti Super needing to hit the $749 price point, plus 4070 needing to compete with 7800XT.
AMD actually went pretty hard on the 7800XT, $499 is a good price for that, and it gets you 16GB in a 4070-class card (5% faster than 4070[0]). NVIDIA needs a 16GB card that is not $1200 - or even $999. It's really got to be $749 or at absolute most $799. That sku can only be AD103, so they need to stop using the harvested dies for 4070 Ti and start using them for 4070 Ti Super. That's why 4070 Ti is being retired - those rumors make perfect sense. They can't use AD102 because to hit 4080 Super shader count they'd have to cut >40% of the die and that's a total waste, and they're using every AD102 for AI cards right now. They can't use AD104 because it's only 12GB.
The rest of the pieces fall into place around that. 4070 Ti has to (largely) go away, except for truly failed chips (and these are also used in Quadro 4000 Ada, which is 20GB). 4080 gets a refresh to use the full-die AD103 chips, but it's not worth more than $999 if there is a lower-tier 16GB model, and it could even be $899. AD104 gets the full-die 4070 Super slotted underneath the 4070 Ti Super. 4070 probably needs a small price cut ($499) to compete with 7800XT (and it still only has 12GB, but NVIDIA always does position at a small premium), so 4070 Super has to slot somewhere between those $499 and $749 cards. Which probably means $599. 4080 pricing (the only AD103) was always the most incredibly silly of the whole lineup, so they have the most room to come down. They totally could do 4080 super for $899 if they wanted, the margins are ridiculous on that chip. $749 for 4070 Ti 16GB with AD103 cutdown is super plausible too.
I know everyone is convinced that NVIDIA is going to just keep raising prices forever, but please consider the possibility that the people doing that pitchfork-brigade are just rabble-rousing. The problem is that so many people have bought into the nvidia-bashing over time that it seems reasonable and seems common-sense but it's really not. It is just like last summer before Ada launched, when everyone was absolutely positive that 4090 was going to be 600W, then 800W, finally 900W TBP [1][2][3]. People unironically argued that shrinking two nodes wouldn't buy NVIDIA absolutely anything in terms of efficiency improvements and 4090 was just going to be 3090 perf/w but higher perf. NVIDIA is not going to just stack the supers on top of the existing stuff, there will be some combination of price cuts and faster skus and (for 4070 Ti super) a VRAM increase. 4070 and 4070 Super represent a 15-20% improvement from MSRP and 4070/4070 Ti were alright to begin with, and the 16GB 4070 Ti sku is as good as they can do for a value-oriented 16GB sku with AD102 out of the picture. 4080 Super will continue to undeniably be one of the graphics cards of all time, but there is a price where it makes sense for the performance step too.
People get absolutely silly and buy into the wildest, most nonsensical bullshit as long as it feeds the "green man bad" itch. And they eagerly believe and overhype the most nonsensical pro-AMD bullshit (Zen2[4], fury x, vega, and RDNA3 being the most extreme, but it happens every time to some degree). And there is a whole little media ecosystem devoted to feeding this and catering to it - it's not called "green gaming tech" after all. Many people are generally resentful and bitter about NVIDIA and eagerly soak it up and amplify it. A lot of people have low-key adopted pro-AMD fanboy attitudes and framing (or bashing other brands, like the absurd amount of denial about apple silicon performance/efficiency, at first, or tiger lake/ice lake efficiency, etc), without even realizing it or really thinking of themselves as pro-AMD. It's kinda just the sea in which we swim, on tech-focused social media. PCMR is completely crazy pro-AMD shit all the time f.ex (which makes sense when you see the head mod! [5]).
Like I said, there's a lot of people who are just unhappy that a 4070 isn't $299, the "socially gamer, fiscally conservative" crowd, and they're just kinda gonna continue being unhappy. The overall cost trend is up, it's still an overall fairly cheap hobby, but a high-end GPU with x mm^2 of silicon costs legitimately quite a bit more to design + validate + build now, and it's going to continue to go up in the future. The flip benefit is that hardware ages slower too, and NVIDIA has provided pretty good forward-support for DLSS that will soon be baselined by the switch 2's T239 chip. I fully expect some CUDA-like lifecycle where after 5+ years they may launch "DLSS Hyper Resolution" that uses some new accelerator, but I think they probably won't pull support for older cards etc and the switch 2 commits them to pretty significant ongoing work on Ampere era hardware (possibly with an Ada OFA?). That's the reason they've been on a tear with DLSS 2.5, 3.0, 3.5, and 4.0 (coming soon) all boosting image quality at very low input resolutions+framerates. DLSS needs to run real good on Switch for upscaling those 3rd party ports in docked mode.
But 4070 and 4070 Super should receive some good pricing and 4070 Ti Super finally gets a decent card with DLSS and tensor and 16GB that isn't $1200. I think I was right. That's not an overall bad adjustment, and NVIDIA can't really do a big price cut on AD102 right now due to AI mania, so that's as good as it's gonna get. 4090 dipping below MSRP (got to ~$1400-1450 iirc) was truly the bottom for this gen, I think.
Hopefully there are some lower-end SKUs too, but I think NVIDIA is constrained by a fairly large inventory of 4060 Ti 8GB that they're managing carefully. The 16GB SKU was a last-minute thing, they had already built out a ton of 8GB and they need to let the market absorb them before nuking the price too bad (or else partners get rekt). The 4060 Ti 8GB models have already dropped a ton in street price, they are decently faster and at some point they're worth it too. Rumors are contradictory on this too but Gigabyte EEC filings just disclosed a 7600XT 16GB, which sounds like it'd be based on the Navi 33 die. Deep cuts on Navi 32 parts have always seemed problematic to me due to overall wafer usage (it gets hit hard because it's MCM, it's as much 6nm silicon as a 7600 or somesuch, and also another big piece of 5nm) at the $300-350 price point etc, but N33 7600XT as simply clamshell-memory 7600 is doable and believable and would undercut the 4060 Ti decently. They could offer a 7600XT at $300/$329 for sure. So they may have to do a 4060 super/Ti adjustment at some point regardless.
But the reason AMD isn't just massively undercutting NVIDIA is because they don't have a magic wand on cost either. The economics of a 7700 non-XT will be interesting, I think that's really a lot of silicon to use for not a lot of margin (vs selling datacenter/mobile chips), I think the 7700 would be tough at (eg) $349, but who knows with AMD. The N31 design seems to be where the MCM really hits its stride in terms of scalability being worth the area hit vs monolithic, the power consequences, etc, and they could probably drop both 7800XT and 7900XT prices if needed against 4070 or 4070 Ti Super.
[5] https://www.youtube.com/watch?v=Qv9SLtojkTU (this whole video is really good but 18:12 is the real reason people are upset, people want raster to keep scaling even though there's good reasons that it isn't, and working smarter not harder - increasing asymptotic performance-per-transistor - is the only way forward. DLSS would seem like a magic "more frames button" 10 years ago but it's so controversial now for absurd reasons. That video from Steve GN in another comment I made is so incredibly hyperbolic and absurd and that's the thing, a ton of people just lost their shit over the RTX launch and still haven't recovered.)
Nvidia has a monopoly and for that reason has lost interest in giving value to customers. It’s busy with AI and no longer seems interested in gaming.
AMD seems to have no interest in competing with Nvidia and is content to release products that simply match Nvidia's terrible value. Despite being handed every possible chance to thrash Nvidia via competitive pricing, it simply releases slow GPUs at high prices and shrugs.
Intel competes hard since its in distant third place but it’s products are far far behind in terms of performance.
It’s lose lose lose for the GPU consumer.
The general attitude of GPU manufacturers is very different from CPUs where it is a knockdown beat up sprint fight to the death to make the fastest and cheapest CPUs possible shipped as quickly as possible with the goal of winning.
In GPUs it’s just a slow lazy fat gold grab in which no player is interested in making any further effort than they need to, for AMD and Nvidia anyway.
>Nvidia has a monopoly and for that reason has lost interest in giving value to customers. It’s busy with AI and no longer seems interested in gaming.
I look at Nvidia Reflex and DLSS improvement from what I thought was fancy to something I found more like magic. I dont see how they have lost interested in gaming.
If anything, their AI business revenue has been partly subsidising their Gaming market where they are now the first player after Apple to get leading edge node capacity.
> Intel competes hard since its in distant third place but it’s products are far far behind in terms of performance.
Not THAT far behind. The arc 770 is currently $280 and is competitive with the nVidia 3060 at $290.
This is intel's first generation dedicated video card. I have a lot of hope for what comes with gen2. Assuming Intel doesn't just abandon the line (here's to hoping) there's a really good chance they come up from behind.
Make sure you look into more recent arc benchmarks. One of the issues Intel had is their drivers sucked on release. They've mostly resolved a lot of those issues.
I think the entire reason Intel entered the fray here is because they saw nVidia and AMD stagnating.
You can rebadge same “powerful” stuff for years. I have AMD cards that are simply rebadged previous gen ones with tweaked memory clocks, if that. I’m pretty sure this is not unique to AMD.
Jensen Huang has been saying in various interviews, etc for years that Nvidia is just as much a software company as a hardware one. It's hard to find great sources because these comments come from old interviews but for some time now he has increasingly talked about Nvidia's > 15 year investment in CUDA, how their engineer spend has increasingly been on software, how and why software and hardware needs to be approached strategically and holistically with focus on functionality, end-user usability, etc.
The challenge for AMD especially is they're not a software company. In fact, after six years of trying to use ROCm and shaking my head every time (as recently as this week) I'm convinced they just don't get it and I'm not sure they ever will. It's not in their DNA and even a lot of die-hard desktop AMD gaming users frequently complain about things as basic as a Windows driver.
Nvidia certainly isn't perfect but the fact is you have a very good chance of taking anything with Nvidia stamped on it, installing the driver, and a docker command later you're running whatever. Pascal laptop GPU to Hopper H200 are all supported up to and including CUDA 12. Universally, without exception, from even before the hardware is launched even when it means more-or-less backporting support for it (see Hopper and CUDA 11.8). The U in CUDA stood for unified and they've incurred the software and hardware expense to enable this across their entire product line (without exception) for over a decade. Meanwhile ROCm just added support for their flagship consumer GPU (one year after launch) to ROCm 6.0. That brings their total number of officially supported GPUs to 10. Not 10 families, literally 10 SKUs (more or less). CUDA is what, at least 100?
Of course there are hacks to "support" officially unsupported GPUs but they're just that, hacks (environment vars to pretend your card is something else). You go from already shaky ROCm to "let's see when this is going to randomly crash even if I can kind of get it to run" ROCm.
It's truly mind boggling. Consider their ROCm docker images[0]. They've had ROCm 5.7 images with Python 3.10, which is more or less the "standard" python ver targeted by most projects. Great, they released ROCm 6.0 for the GPU they launched a year ago! Wait, it uses Python 3.9?!? Try firing up something as basic as oobabooga/text-generation-webui. It blows up immediately because Python 3.10.
What kind of organization standardizes on a Python release and then uses an older version for a subsequent base software release? The further you dig into ROCm the more you see things like this that are (frankly) clownish. If I seem frustrated it's because I'm rooting for them. I've spent money on their hardware in attempts to make reasonable use of ROCm since Vega. Every single time I come away wondering how anyone can take this seriously, power the system off, and then get back on my Nvidia/CUDA hardware so I can actually get something done. It's a combination of head shaking and actually laughing out loud.
The challenge here is CPU vs GPU. As we all know x86_64 is x86_64 in terms of software. Obviously there are additional instruction sets, etc but generally speaking anyone can take a drive out of an Intel system, put it in an AMD system, and it will more-or-less boot and run without issue. AMD CPUs are phenomenal because AMD doesn't have to do much software work (if any) to enable that.
This is how they've been able to do so well (rightfully so) in competing with Intel on CPU. GPU/GPGPU is a completely different animal and that's why after six years of ROCm they have single digit percentage market share in GPGPU.
What makes this even more perplexing is they famously beat the pants off Intel on 64-bit (haha, Itanium). I'm not sure what happened to them since or what's going on with this GPU situation.
Nvidia is a monopoly and that's never good. However, it's not as though they got there by holding guns to people's heads. They have consistently and reliably delivered products (however "abusive" and price manipulated) that get the job done. For 15 years.
AMD makes excellent and very capable hardware. As my pile of AMD GPUs from the past half-decade illustrates that doesn't mean anything if you don't have the most basic of software to actually utilize it.
If I was an AMD hardware engineer I would be screaming at the software people. I can't imagine how frustrating this must be for them.
>AMD CPUs are phenomenal because AMD doesn't have to do much software work (if any) to enable that.
Except when they must.
7900X3D and 7950X3D ended up being squarely inferior to the lower-tiered 7800X3D because they could not and would not provide software tooling to assist the scheduler in handling asymmetric cores.
Unlike Intel and their asymmetric cores which have Intel Thread Director assisting the scheduler.
I've personally also had nothing but terrible experiences with AMD GPU drivers, to the point I've become blind to their offerings when I'm in the market for a video card. Nvidia and even Intel's offerings Just Work(tm), and that virtue becomes only more valuable as I get older and my time becomes ever more priceless.
The 4090, although definitely overpriced, is one of the most impressive cards I've ever used. It handles almost every game I have at 4k120fps with little to no issues. It's by far the best card I've ever owned.
I think the mid-range may not be as attractive but even the 4080 boasts some pretty impressive performance. I don't really think we need yearly releases of cards if the current generations are already doing so well.
My experience of the 4090 is similarly positive. I ended up picking one up at a discount from a MicroCenter while I was traveling because someone had purchased it, realized when trying to install it into their case that it wouldn't fit, and then returned it. Probably the best purchase I've made all year as someone who plays video games, does some moderately intensive video editing for YouTube and likes playing around with local image generation and large language models.
My advice to anyone in the market for a new graphics card is to just save until you can get a 4090.
Similar to rewind.ai, I want a 100% offline AI with Vision to run on every file and image I touch. Every file I use can be categorized and available to prompt against.
reply