Always keep in mind that you can’t go faster than what the harware can allow. Sometimes even by what the OS can allow. We can do way better on some specific tasks. But for for that you need to rewrite some algo implementation to better take advantage of all the hardware can allow. ML is a specific task, if you can take advantage of the harware to do better matrix mul. Does not mean you can have the same speed ups overall. Mojo for example
reply