Hacker Read

tim333 · 2018-05-30 09:50:18+00:00

The results were published - https://arxiv.org/abs/1712.01815

I agree AlphaZero had fancier hardware and so it wasn't really a fair fight.

zaf | karma 156 | avg karma 1.75 · | 2017-12-14 08:42:38+00:00

"However, the experimental setting does not seem fair. The version of Stockfish used was not the last one but, more importantly, it was run in its released version run on a normal PC, while AlphaZero was ran using considerable higher processing power. For example, in the TCEC competition engines play against each other using the same processor."

That does sound fishy,

reply

bwat49 | karma 272 | avg karma 1.62 · | 2018-04-26 17:38:56

it looks like anandtech's methodology was flawed: https://www.anandtech.com/show/12678/a-timely-discovery-exam...

ipsum2 | karma 6758 | avg karma 4.92 · | 2021-01-16 09:43:32+00:00

> Cerebras hasn’t released MLPerf results or any other independently verifiable apples-to-apples comparisons. Instead, the company prefers to let customers try out the CS-1 using their own neural networks and data.

I suspect that despite Cerebras being a massive technical achievement (wafer-scale computing!) it performs worse than standard GPUs, which is why they won't release benchmarks on standard models (e.g. resnet/transformers/etc)

reply

muxr | karma 27 | avg karma 1.0 · | 2022-05-30 06:56:54

There is also an article about how they did submit a score from one of these Chinese "exaflop" systems, for a different benchmark and it turns out it can only achieve the claimed performance at half precision:

https://www.tomshardware.com/news/chinese-exascale-supercomp...

reply

alephxyz | karma 836 | avg karma 3.13 · | 2023-10-31 15:44:39

>While performance on benchmarks typically improves smoothly, sometimes specific capabilities emerge without warning (Wei et al., 2022a).

The conclusions of that paper aren't very convincing (see https://arxiv.org/abs/2304.15004 ).

reply

sangnoir | karma 15035 | avg karma 2.79 · | 2020-11-19 18:17:44+00:00

It's hard to reconcile the performance across the 2 tests, perhaps they were set up/tuned differently. I wish they had published their methodology - I'd have loved to benchmark my long-in-the-tooth RX580 & rocm-tensorflow against their numbers

KerrickStaley | karma 1887 | avg karma 6.35 · | 2018-12-24 05:59:45

This title seems like an exaggeration of what is claimed in the article. In the article, they state that they benchmarked their solver in a biased way that made their solver look like it performed better than it did, not that they faked performance data altogether.

Fnoord | karma 8086 | avg karma 1.35 · | 2024-01-19 09:36:11

The article mentions "See Longbottom’s extensive tests and comparisons article here." and [1]. This was already mentioned in a snapshot of 18 Jan 2024 [2] so it wasn't added after your criticism.

[1] http://www.roylongbottom.org.uk/Cray%201%20Supercomputer%20P...

[2] https://archive.is/a99i3

reply

rubyfan | karma 2763 | avg karma 2.43 · | 2017-02-04 17:58:22

> Microbenchmark results don't linearly scale to everything else.

Certainly they don't. But when evaluating something like this it is up to the reader to have critical thinking skills and realistic expectations about the level of experimental design applied to an admittedly alpha implementation published on a wiki on GitHub vs. maybe reading something like published in a peer review journal.

reply

tomohelix | karma 1857 | avg karma 5.22 · | 2023-11-06 17:04:06

Wasn't there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?

eridius | karma 15723 | avg karma 2.78 · | 2016-06-21 18:11:06+00:00

Those benchmarks don't include energy efficiency, which was a primary design goal of LZFSE. I also don't see LZFSE on that page anyway, which makes it kind of hard to compare.

idealmedtech | karma 759 | avg karma 3.22 · | 2022-01-31 08:38:54

Note: since Alex is a fairly gender neutral name I'm going to use they/them pronouns.

I went through and read it, and the submitter is incredibly confrontational and not at all open to feedback on the correctness of their benchmark. Also, when presented with contradictory evidence (their own benchmark where the results show io_uring is faster than epoll on other machines), they essentially dismiss it and says the other users ran their benchmark wrong, or that they can't reproduce their results on their machine, or that the other users used Boost and therefore are invalid. So not an entirely reliable criticism coming from them, in my opinion.

reply

erdigious | karma 6 | avg karma 1.5 · | 2022-06-01 22:12:29

See my above comment - the choice of metrics in table 3.2 doesn't really make any sense.

Also, they revived some ancient 1998 IBM chip, and report on that, for no clear reason. They call it a 2004 benchmark, but what actually happened is that in 2004 someone made some synthetic variants and put them into a dataset. This is not some widely used benchmark in the field. Given their highly questionable choice of metrics, I would not be surprised if there was some serious cherry-picking on the benchmark as well.

reply

dekhn | karma 28741 | avg karma 2.63 · | 2023-01-03 18:54:05

Not a retraction, just that many of the claims about supercomputer performance were easily shown to be less than accurate, making any claims about supremacy less exciting.

aurareturn | karma 2812 | avg karma 2.39 · | 2024-06-13 09:49:40

For one, they didn't use TensorRT in the test.

Also, stuff like this is hard to take the results seriously:

  * To make an accurate comparison between the systems with different settings of tensor parallelism, we extrapolate throughput for the MI300X by 2.

  * All inference frameworks are configured to use FP16 compute paths. Enabling FP8 compute is left for future work.

They did everything they can to make sure AMD is faster.

cubefox | karma 5752 | avg karma 1.8 · | 2023-12-13 12:08:12

For the first Imagen (and for Parti) they released detailed papers. Now they do not even release benchmark results. A shame.

jgehrcke | karma 253 | avg karma 4.08 · | 2009-08-07 09:13:05+00:00

We are totally of the same of the same opinion: http://gehrcke.de/2009/08/whats-faster-a-supercomputer-or-ec...

Additionally, one has to say that the quantities in the graph Ian is showing have a much bigger uncertainty than pretended.

Jan-Philip Gehrcke

reply

ric2b | karma 1145 | avg karma 0.83 · | 2020-05-30 21:31:54+00:00

> also, this comparison would not register as a proper “benchmark” as it’s not even close to how you would perform a proper benchmark. it’s more of a data point.

I would prefer to not have to argue about that in court...

reply

littlestymaar | karma 8278 | avg karma 1.96 · | 2023-06-14 16:52:12

> the performance looks outstanding!

Well, given that in their benchmark Go ends up Boeing almost an order of magnitude raster than Rust, I wouldn't trust their benchmarking methodology too much.

reply