Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I'm an ML engineer but I know nothing about the inference part. Are there that many kind of devices that optimizing for inference on a device is a thing? I thought almost everyone serves from GPUs/TPUs and hence there are only 2 major device types. What am I missing here?


view as:

I saw somewhere that 95% of all ML inference tasks is still done on CPU

It's true that inference is still very often done on CPU, or even on microcontrollers. In our view, this is in large part because many applications lack good options for inference accelerator hardware. This is what we aim to change!

So, in your opinion, why would those CPU users want to migrate to an FPGA and your software rather than to Nvidia T4 or Tegra and CUDA?

It depends on the application. For some use cases, moving to a GPU makes total sense. However, if you have power constraints, form factor constraints, performance constraints or simply want to be in control of your own hardware, using an FPGA with Tensil may be a better option.

There are four big categories of ML accelerators. You already familiar with CPUs and GPUs; then there are FPGAs, which offer better performance and efficiency while remaining flexible. Finally there are ASICs (of which the TPU is an example), which offer the best performance and efficiency but retain very little flexibility, meaning if your ML model doesn't work well on an ASIC then your only option is to change your model.

We chose to focus on FPGAs first because with them we can maximize the usefulness of Tensil's flexibility. For example, if you want to change your Tensil architecture, you just re-run the tools and reprogram the FPGA. This wouldn't be possible with an ASIC. That said, we'll be looking for opportunities to offer an ASIC version of our flow so that we can bring that option online for more users.


Legal | privacy