Hacker Read

dpwm | karma 2316 | avg karma 4.5 · 2018-05-30 12:58:36+00:00

> A much stronger rebuttal of the hype would have been based on the technical limitations of deep learning.

I'm not even sure how you'd go about doing that. You could use information theory to debunk some of the more ludicrous claims, especially ones that involve creating "missing" information.

One of the things that disappoints me somewhat with the field, which I've arguably only scratched the surface of, is just how much of it is driven by headline results which fail to develop understanding. A lot of the theory seems to be retrofitted to explain the relatively narrow result improvement and seems only to develop the art of technical bullshitting.

There are obvious exceptions to this and they tend to be the papers that do advance the field. With a relatively shallow resnet it's possible to achieve 99.7% on MNIST and 93% on CIFAR10 on a last-gen mid-range GPU with almost no understanding of what is actually happening.

There's also low-hanging fruit that seems to have been left on the tree. Take OpenAI's paper on parametrization of weights, so that you have a normalized direction vector and a scalar. This makes intuitive sense for anybody familiar with high-dimensional spaces since nearly all of the volume of a hypersphere lies around the surface. That this works in practice is great news, but leaves many questions unanswered.

I'm not even sure how many practitioners are thinking in high dimensional spaces or aware of their properties. It feels like we get to the universal approximation theorem and just accept that as evidence that they'll work well anywhere and then just follow whatever the currently recognised state of the art model is and adapt that to our purposes.

reply