Hacker Read

norswap | karma 5563 | avg karma 3.46 · 2018-05-30 07:10:05

Bleh, no it isn't.

I am 100% in agreement with the author on the thesis: deep learning is overhyped and people project too much.

But the content of the post is in itself not enough to advocate for this position. It is guilty of the same sins: projection and following social noises.

The point about increasing compute power however, I found rather strong. New advances came at a high compute cost. Although it could be said that research often advances like that: new methods are found and then made efficient and (more) economical.

A much stronger rebuttal of the hype would have been based on the technical limitations of deep learning.

reply

alexandercrohde | karma 6253 | avg karma 4.53 · 2018-05-30 07:22:42

> A much stronger rebuttal of the hype would have been based on the technical limitations of deep learning.

Who's to say we won't improve this though? Right now, nets add a bunch of numbers and apply arbitrarily-picked limiting functions and arbitrarily-picked structures. Is it impossible that we find a way to train that is orders of magnitude more effective?

reply

norswap | karma 5563 | avg karma 3.46 · 2018-05-30 12:26:39+00:00

To me, it's a bit like the question "Who's to say we wont find a way to travel faster than the speed of light?", by which I mean that in theory, many things are possible, but in practice, you need evidence to consider things likely.

Currently, people are projecting and saying that we are going to see huge AI advances soon. On which basis are these claims made? Showing fundamental limitations of deep learning is showing we have no idea how to get there. How to get there yet, indeed, just we have no idea how to do time travel yet.

reply

jacksmith21006 | karma 1319 | avg karma 1.28 · 2018-05-30 07:45:40

Overhyped? There are cars driving around Arizona without safety drivers as I type this.

The end result of this advancement to our world is earth shattering.

On the high compute cost. There is an aspect of that being true but we have also seen advancement in silicon to support. We look at WaveNet using 16k cycles through a DNN and offering at scale and competitive price kind of proves the point.

reply

dpwm | karma 2316 | avg karma 4.5 · 2018-05-30 12:58:36+00:00

> A much stronger rebuttal of the hype would have been based on the technical limitations of deep learning.

I'm not even sure how you'd go about doing that. You could use information theory to debunk some of the more ludicrous claims, especially ones that involve creating "missing" information.

One of the things that disappoints me somewhat with the field, which I've arguably only scratched the surface of, is just how much of it is driven by headline results which fail to develop understanding. A lot of the theory seems to be retrofitted to explain the relatively narrow result improvement and seems only to develop the art of technical bullshitting.

There are obvious exceptions to this and they tend to be the papers that do advance the field. With a relatively shallow resnet it's possible to achieve 99.7% on MNIST and 93% on CIFAR10 on a last-gen mid-range GPU with almost no understanding of what is actually happening.

There's also low-hanging fruit that seems to have been left on the tree. Take OpenAI's paper on parametrization of weights, so that you have a normalized direction vector and a scalar. This makes intuitive sense for anybody familiar with high-dimensional spaces since nearly all of the volume of a hypersphere lies around the surface. That this works in practice is great news, but leaves many questions unanswered.

I'm not even sure how many practitioners are thinking in high dimensional spaces or aware of their properties. It feels like we get to the universal approximation theorem and just accept that as evidence that they'll work well anywhere and then just follow whatever the currently recognised state of the art model is and adapt that to our purposes.

reply