Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

What are "traditional nets" ? What are the "other learning algorithms" ? What is a universal algorithm (and for what)? Neural nets are universal function approximators. There isnt something [edit: a function] they can't learn. When stacked they seem to produce results that are eerily human-like.

I think the "universal algorithm" in the article refers to some kind of emergent intelligence. Well, nothing that he mentions precludes it. Our brains aren't magical machines. Neural nets may not model real neurons, yet it is amazing how they can produce results that we identify as similar to the way we think. There is nothing in computational neuroscience that comes close to this. If anything, the success of deep nets bolsters my belief in connectionism rather than the opposite. I would expect it is very difficult to formulate "intelligence" mathematically, and to prove that DNs can or cannot produce it.



view as:

The issue here isn't whether there is something deep learning can't learn -- with enough data and engineering even nearest neighbor can learn everything.

The problem is, would deep learning scale efficiently that it is feasible to learn the "universal algorithm" with DL?


I think neural nets are approved that with more than 2 hidden layers, it can approximate any functuons. Is there any literature for other algorithms as well?

It is true that neural networks can approximate any function. But what are feasible ways to construct such networks? Part of the point of having "deep" networks is that it's easier to train that way. Scalability does not only refer to the scale of the model itself, but also the difficulty in building such model.

What is that mythical "universal algorithm" ?

The universal approximator approach is another big one that people lean on from the opposite direction (I suppose you could call it the "anti-deep learning" school of thought). Most people don't take the time to consider what a universal approximator actually is.

Are humans universal approximators? Absolutely not. The mere proof that a nn with an infinite number of hidden nodes (which isn't practical, which could end the discussion there) can find a statistical mapping between any two input and outputs is hardly a benefit. There is something so much more to intelligence --- the ability to generalize well (aka not overfitting) is first and foremost.

I'd push you to ask why you perceive that stacking RBMs produces "eerily human-like" results. I'd argue that they don't: http://blog.keras.io/how-convolutional-neural-networks-see-t...


Trying to compare anything to "intelligence" is a trap, imho. Intelligence is not a natural quantity, it's a word. There was no reason for a brain to become a universal approximator, or to have any specific mathematical property for that matter. Brains evolved to adapt behavior to the environment using the senses. The universe didn't care for creating some property called "intelligence".

I do see many results in NLP for example that have an eerie human-likeness, even if nobody can explain why. I am not an expert in the field, but i would think it's a challenge to find similar results made using another technique.


> The universe didn't care for creating some property called "intelligence".

On the face of it, I would have to disagree. It would seem to me that's exactly what the universe selected for, among many other things of course, in our tiny little corner of it.

Otherwise, we wouldn't be here debating it.


Debating the anthropic principle will be left for another thread.

Maybe this universe did select for intelligence.

How about all the other universes that never generated intelligence?

But going back to our own universe, how widespread is intelligence? It's pretty much microscopic, isn't it? I think our universe seems to be mostly geared toward generating empty space, and the rare areas that are not empty are certainly extremely hostile to intelligence.


What do you mean hostile to intelligence? Did you mean hostile to life? And then life being a requirement for intelligence?

Yes. If life can't arise, neither can intelligence.

Our universe is massively hostile to both life and intelligence.


Predicting (function approximation) is perhaps the main use of neural nets but there is another very important use: generating, or in other words, imagination. And in generative mode we need to know about probabilities and latent variables - thus, we need a little more than neural nets to do this task.

You know, k nearest neighbors is also a universal function approximator. Universal function approximation is a red herring.

Apparently function approximation is an important property to make something useful. It seems reasonable that it should be part of an intelligent system. Is there a similar "deep learning" based on k-nearest neighbor classifiers?

With kNN, if you have enough data and a good definition of "distance", then yes, it is in a way similar to "deep learning", just deep in the sense that's data and human-labor intensive.

I'm oversimplifying here, but what linear regression and logistic regression did to kNN is that you can automate the "distance" function, but you still have to manually construct the features. What DL did is one step further -- don't even bother feature-engineering, the network can construct the features themselves.

You see, there isn't a function that kNN can't approximate. If you have an impressive feature list and a training datum that has every feature exactly the same as your input, there is no reason not to directly use the output of said datum. It's the feasibility that matters.

Of course, DL is a huge step forward. It has made "impossible things possible, and hard things easy". The author also acknowledged DL's importance. However, that doesn't mean we should stop at DL.


> Apparently function approximation is an important property

'important' sure but not always a good idea. When you are learning from finite (but potentially very large) number of noisy examples, the universal function approximator will try to approximate the noise also.

Learning is a delicate dance between the capacity of your learner and the complexity of the thing that you are trying to learn. The key notion is that it takes two to tango. If you have few examples, you are better of not using deep nets.

Stated another way, a universal approximator has infinite potential for distraction. "Hey shiny" and it could veer off in a direction you don't want it to go. You want to pick out army tanks from pictures and it might learn to distinguish pictures taken on a cloudy day from those taken on sunny days.


You missed my point entirely. My point is that showing something is a universal function approximator indicates nothing about whether that method is a good method in general. Necessary but not sufficient. kNN is one of the dumbest/simplest methods out there.

a good method for what?

For machine learning. For solving problems. What else?

A Fourier series can model pretty much any function. Doesn't mean it's a good model for your problem.


>Neural nets are universal function approximators. There isnt something they can't learn.

Teach one to sort an arbitrary list. A universal function approximator is not a Turing machine.


ok i meant some function

sort is a function (in the mathematical sense)

then if it 's a continuous function it can be approximated

and if it's not a continuous function?

https://en.wikipedia.org/wiki/Universal_approximation_theore...

It's proved for a subset of functions. Maybe you can prove it for more.


A Neural Turing Machine can learn programs.

There is a new research going on in Neural Turing Machine which, in theory, can exactly do that. Maybe we have to wait some time to see NNs can sort numbers.

On a high level, sort() is a seq2seq function, I think throw LSTM/RNN will yield some interesting results to this problem.

Sibling comments have mentioned Neural Turing Machines, but even just regular recurrent neural networks are technically Turing complete[1].

[1] - http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/


>What are the "other learning algorithms" ?

SVMs, AdaBoost & co, random forests - just to name a few popular ones.

>Neural nets are universal function approximators. There isnt something they can't learn.

Which types of functions you can approximate has nothing to do with being able to "learn everything". Simple neural nets are a great illustration of this point. Networks that can theoretically approximate the target function can easily overfit or oscillate if not set up properly.

In other words, the ability to approximate is the property of your model, while the ability to learn things is the property of your training algorithm.


Absolutely right. By the same qualifications, it's possible, but also completely infeasible, to create a flat map that approximates any function. Because it is obvious that it won't be useful, nobody will write home about it.

I read the "universal algorithm" as a reference to the book 'The Master Algorithm'

> There isnt something [edit: a function] they can't learn.

This is way too sloppy. There isn't something neural nets can't be made to represent - but possibility of representation isn't the same as learning. Learning is not a property of neural nets, it is a property of parameter-adjusting algorithms like backpropagation which don't exactly match up with the representational universality of neural nets.

"Neural nets" have nothing to do with brains or "the way we think." This is marketing crapola.


the universal approx theorem is not sloppy. "Learning" and "Representation" are sloppy terms.

Neural nets have a little bit of something "to do with brains". They were inspired by connectionist ideas. I am all for avoiding marketing speak, but i dont find extremes useful.


> Neural nets are universal function approximators.

True, but that's kind of like saying Rule 110 is capable of universal computation — possible in theory, but not really useful in practice.


> What are "traditional nets" ?

I'm assuming the old feed-forward error back-propagation neural networks, before they got "deep". They had three layers (input, hidden and output) and were trained using straightforward error back-propagation (derivative of the training weights wrt the error).

This was a relatively popular technique in--I think?--the 90s (this is a guess, I started my masters ML in 2003 and at that moment neural networks were seen as a thing of the past). So at some point these traditional neural networks hit a wall. We wanted to use more layers but the error-backprop wasn't quite up to it. And as it later turned out, computing power and size of datasets were also lacking, but IIRC we didn't quite realize those aspects back then and mostly saw it as a limitation of the Neural Network training algorithm.


Legal | privacy