Hacker Read

amirhirsch · 2024-02-19 21:44:12

It seems like you are making general purpose chips to run many models. Are we at a stage where we can consider taping out inference networks directly propagating the weights as constants in the RTL design?

Are chips and models obsoleted on roughly the same timelines?

reply

tome | karma 8885 | avg karma 1.75 · | 2024-02-19 17:23:39

We build out large systems where we stream in the model weights to the system once and then run multiple inferences on it. We don't really recommend streaming model weights repeatedly onto the chip because you'll lose the benefits of low latency.

cavisne | karma 1294 | avg karma 2.45 · | 2023-06-27 23:00:53

The "even for inference" thing has turned into a bit of a trap imo.

Data parallel models scaled up for training and then could run on individual chips, but these massive model parallel models require a couple of chips directly linked together even to do inference.

So the idea that a competitor could come in with a simple, cheap inference chip doesn't really work.

reply

saynay | karma 453 | avg karma 3.33 · | 2023-06-06 07:21:53

We are already seeing chips for inference, really. It's how these models are getting into the consumer market. A lot of the big phones have an inference chip (tensor, neural core, etc), TV are getting them, most GPUs have some stuff dedicated for inference (DSS and superres).

xiphias2 | karma 7752 | avg karma 2.09 · | 2019-07-04 02:10:43

Do you know any production Bayesian inference that really needs specialized chips?

The problem with NN inference / training is that they are eating up the datacenters.

At the same time you can't achieve 20x speedup compared to GPUs if you need 32 bit floats, because in that case your relative energy utilization is not that bad.

reply

tdba | karma 765 | avg karma 6.71 · | 2022-03-12 19:15:34

It's true that inference is still very often done on CPU, or even on microcontrollers. In our view, this is in large part because many applications lack good options for inference accelerator hardware. This is what we aim to change!

bubblethink | karma 3290 | avg karma 2.43 · | 2023-03-28 12:36:33

You can run inference on GPUs. These are just models and weights.

zozbot123 | karma 326 | avg karma 0.45 · | 2018-12-23 11:58:11

ML inference is not magic: by and large, it's just a combination of simple operations like matrix multiplications/dot products, element-wise nonlinearities, convolutions and other stuff that vector processors, GPUs and increasingly CPUs (thanks to SIMD) are very well optimized for. (In theory one could optimize a chip for some specific, well-defined ML architecture, even to the point of "wiring" the architecture in hardware, and people used to do such things back in the 1980s when this was needed in order to even experiment with e.g. neural network models. But given how fast ML is progressing these days, there's just no reason for doing anything like that nowadays!)

anonylizard | karma 1689 | avg karma 5.73 · | 2023-06-06 07:00:40

Inference is mostly just matrix multiplications, so there's plenty of competitors.

Problem is, inference costs do not dominate training costs. Models have a very limited lifespan, they are constantly retrained or obsoleted by new generations, so training is always going on.

Training is not just matrix multiplications, given hundreds of experiments in model architecture, its not even obvious what operations will dominate future training. So a more general purpose GPU is just a way safer bet.

Also, LLM talent is in extreme short supply, and you don't want to piss them off by telling them they have to spend their time debugging some crappy FPGA because you wanted to save some hardware bucks.

reply

m_mueller | karma 5900 | avg karma 2.12 · | 2023-12-17 21:11:02

Inference-only hardware like this could be a temporary cost saving solution to scale up AI infrastructure, but I think a chip the size of this should be able to support training, otherwise it’s just a waste of money and energy. I predict inference will move to edge computing based on mobile chips like the ones from Qualcomm in the midterm.

nabla9 | karma 38285 | avg karma 4.67 · | 2023-11-15 12:13:25

I suspect that this is where all inference processors are heading eventually. The benefits are just too great when exact computation is not required.

Training might be harder to implement.

reply

brrrrrm | karma 2032 | avg karma 4.19 · | 2022-07-07 15:35:14

For inference, models should be compiled down directly to WASM or WebGPU or whatever, right? The driving language really shouldn't matter at the end of the day.

Unless you've got massive compute bound transformers or old-school full convolutions, if you're interpreting a list of operations you're going to lose perf.

reply

bootsmann | karma 605 | avg karma 3.18 · | 2023-04-12 04:50:12

Chips optimized to perform the type of calculations used for NN inference at high parallelism. A good example would be the google spinoff https://coral.ai/ (though their usecase is highly limited by sub-par software constraints)

zdyn5 | karma 282 | avg karma 5.64 · | 2024-02-26 02:58:35

Is software that important on the inference side, assuming all the key ops are supported by the compiler? Once the model is quantized and frozen the deployment to alternative chips while somewhat cumbersome hasn’t been too challenging, at least in my experience with Qualcomm NPU deployment (trained on NVIDIA)

fooker | karma 3408 | avg karma 1.71 · | 2019-07-02 09:22:16+00:00

Future? Most modern CPUs do this to some extent. Remember you can only have a very simple model here, because the inference time has to be extremely fast.

nine_k | karma 29426 | avg karma 2.95 · | 2023-08-28 05:30:10

Compact, low-power chips to run ML inference on stuff like vision, image generation, high-quality voice synthesis, maybe even translation and LLM stuff, would be very welcome.

Imagine a good inference engine right in a phone.

reply

ispiansclsda | karma 5 | avg karma 0.42 · | 2018-10-08 06:02:40+00:00

It is interesting that one can write code (for which certain computational/logic structures are automatically inferred) describing hardware to run inference models, while inferring that such a piece of computing power will be useful to infer the future.

senttoschool | karma 2061 | avg karma 1.34 · | 2023-02-14 02:55:23

Those working in AI, are there any hardware breakthroughs coming for inference?

tlrobinson | karma 30498 | avg karma 4.18 · | 2018-05-16 19:10:38+00:00

Inference on embedded hardware makes sense, but training no so much.

applgo443 | karma 87 | avg karma 1.34 · | 2022-03-12 13:44:27

I'm an ML engineer but I know nothing about the inference part. Are there that many kind of devices that optimizing for inference on a device is a thing? I thought almost everyone serves from GPUs/TPUs and hence there are only 2 major device types. What am I missing here?