ok... why tho? genuinely ignorant and extremely curious.
what's the TFLOPS/$ and TFLOPS/W and how does it compare with Nvidia, AMD, TPU?
from quick Googling I feel like Groq has been making these sorts of claims since 2020 and yet people pay a huge premium for Nvidia and Groq doesn't seem to be giving them much of a run for their money.
of course if you run a much smaller model than ChatGPT on similar or more powerful hardware it might run much faster but that doesn't mean it's a breakthrough on most models or use cases where latency isn't the critical metric?
what's the TFLOPS/$ and TFLOPS/W and how does it compare with Nvidia, AMD, TPU?
from quick Googling I feel like Groq has been making these sorts of claims since 2020 and yet people pay a huge premium for Nvidia and Groq doesn't seem to be giving them much of a run for their money.
of course if you run a much smaller model than ChatGPT on similar or more powerful hardware it might run much faster but that doesn't mean it's a breakthrough on most models or use cases where latency isn't the critical metric?
reply