>> Again I share the aesthetics distaste to how this progress looks. But one has to get over such personal tastes.
You're the one who brought aesthetics into this, not me. I just want to know how things work, and why. That's what science means, to me.
And the process we both, I think, understand is happening in machine learning research doesn't help anyone understand anything. One team publishes a paper claiming this one neat architectural trick beats such-and-such systems on such-and-such benchmarks. Three weeks later another team beats the former team on the same benchmark with another neat trick.
All we learn is that some people came up with some neat tricks to beat benchmarks, and that you can't reuse the same trick twice. So what's the point of the trick, then, if you have to do it all from scratch, and you never know how to make something work?
Neural nets work only when they work. When they don't work, nobody will tell you. Teams who beat the SOTA will gloat about it and keep schtum about all their many failures along the way. Their one neat trick can't beat _this_ benchmark? Nobody will ever know.
The sum of knowledge generated by this process is exactly 0. That is why I say that progress is not being made. Not because it looks ugly. I don't care about elegant or ugly. I don't know why you assumed I do. I care about knowledge, that's my schtick. When there's a few dozen thousand people putting papers on arxiv that tell us nothing at all, that's a process that generates not knowledge, but noise.
>> DeepL today translates to and from Hungarian much much better than the best systems 7 years ago.
Based on what? BLEU scores?
>> The bitter lesson is called bitter for a reason. It leads elegance-seekers to despair and they downplay the results. They rather shift goalposts and deny that the thing that has landed really is a plane.
That's a rotten way to react to criticism: assume an emotional state in the critic and dismiss them as being dishonest. If you can easily dismiss the criticism by pointing out its flaws, do it, but if all you have to say is "nyah nyah you're butthurt" then you haven't dismissed anything.
> The improvements are usually in the 1% range, from old models to new models, and the models are complex. More often than not also lacking code, implementation / experiment procedures, etc. [...] no idea if the paper is reproducible, if the results are cherry picked from hundreds / thousands of runs, if the paper is just cleverly disguised BS with pumped up numbers to get grants, and so on.
Personally, I'm bearish about most deep learning papers for this reason.
I'm not driven by a particular task/problem, so when I'm reading ML papers, it is primarily for new insights and ideas. Correspondingly, I prefer to read papers which have new perspectives on the problem (irrespective of whether they achieve SOTA performance). From what I've seen, most of the interesting (to me) ideas come from slightly adjacent fields. I care far more about interesting & elegant ideas, and benchmarks to just sanity-check that the nice idea can also be made to work in practice.
As for the obsession with benchmark numbers, I can only quote Mark Twain: “Most people use statistics like a drunk man uses a lamppost; more for support than illumination.”
>> That is an example of the difference between expensive toys like Dall-E and world-changing scientific work.
There isn't really that much difference. DeepMind took some 3 years to max out CASP protein-folding benchmark running from 1994 on a problem known since ~1960. Protein folding is exactly the type of world-changing scientific work. But it is just a tip of the iceberg. DL is being successfully applied to PDE solving or electron density predictions and more.
The psychedelic hamster dragon is an example of the fact that DL is capable of 0-shot generalization. It is a significant scientific observation in itself.
And with regard to your example. We are discussing whether deep learning has staled. I give you mostly examples from last year like EfficientZero (30 Oct) ~500x improvement in sample efficiency over 2013 DQN, some literally not older than a month: GLIDE (20 Dec), Player of Games (6 Dec).
You are giving me example from 2004 (2009?).
I'm not trying to troll you. I just think you are biased in you assessment of significant results and what it means to be stalled.
>> I think that a part of the problem is that older ML PhDs are angry that deep learning is so easy (until the learning rate fails to provide convergence of course...) and would prefer that their preferred methods would still reign supreme.
I think that's definitely a part of it, and I feel that way sometimes myself (not that I'm a PhD). But there's another side of that reluctance that lies on the axis of model accountability and explicability. A lot of modern ML/Deep Learning doesn't -feel- like we're understanding anything any more than we did ten years ago. Yes our black-box results are better according to the tests we've laid out for them, but there's something more slippery about the 'why', beyond the handwave of 'complexity'. Maybe this is just the way it will be going forward (in the spirit of Quantum's "shut up and calculate"), but it is not easy to give up something that you can wrap your head around with something that kind of just takes care of itself, especially if you're in the business of seeking knowledge instead of results.
> And I don't blame them- it could impact funding, access to journals, etc.
No, being "understanding" about this makes you part of the problem.
Part of research is trying new things, and exploring, but you should always be checking this against simpler techniques and currently accepted best performance. If it isn't an improvement, you don't publish at all unless you have something interesting to say about why it isn't improving things. Or perhaps in a methods survey.
One thing that happens when a technique like deep learning gets popular and things like python toolkits pop up making it easy to try, is that you get researchers in adjacent areas who don't really understand what they are doing trying to apply it to their favorite problem domain, and then comparing almost exclusively to other naive groups doing the same thing. It can be hard to tell if there is anything interesting there, even when there are dozens of conference papers, etc.
Basically the same thing happened with kernel methods when SVMs were hot, to a smaller scale.
Compare this to say, AlexNet. The reason that was immediately obvious something interesting was happening there was the fact that lots of people who did know what they were doing in models, had tried lots of other approaches, and you could make direct comparisons.
So yes, blame them. I do think negative results should be valued higher, but the fact you did some work doesn't make it publishable.
Frame it another way, if you give me a paper proposing a complex model and I grab your data an play with it a bit and it turns out linear regression with appropriate pre treatment works better ... well then I'm forced to believe you are either incompetent or being dishonest. Or, if students, your supervisors are.
This generalizes well. You should always be comparing against a known (or considered) good solution on your data under the same conditions, not just comparing against last years conference paper and variants you are trying to "improve". The right choice of baseline comparison will depend a bit on the domain, but not including it at all is shockingly poor scholarship.
I've even seen paper submissions with no direct comparisons at all, because the "researchers" didn't have access to the comparators data, and were too lazy to implement the other methods. Which leads to another sloppiness - methods that get pulled into comparison not because they are the right choice, but because there is a publicly available implementation. In the best case this forms a useful baseline. In the worst case, well, I guess it's good for your citation count if you implemented it :)
>First off right now there seems to be a drive to explain many phenomena in ML in particular why neural networks are good at what they do. A large body of them reaches a point of basically "they are good at modeling functions that they are good at modeling".
Since this is closely related to my current research, yes, ML research is kind of crappy at this right now, and can scarcely even be considered to be trying to actually explain why certain methods work. Every ML paper or thesis I read nowadays just seems to discard any notion of doing good theory in favor of beefing up their empirical evaluation section and throwing deep convnets at everything.
I'd drone on more, but that would be telling you what's in my research, and it's not done yet!
>> I was very interested to read the story a few days ago about the relationship between compute resource and results in deep learning. https://blog.openai.com/ai-and-compute/?
Aye. My reaction to that graph was that this rate of progress can't be sustainable. If industry was throwing good money after bad to find out for how long you can keep ussing bubblesort given more and more compute, before having to develop a better sorting algorithm- that's what it would look like.
>> Answer sets explode all solutions and then eliminate contradictions, we couldn't have dealt with them in the 90's (much less in 1985) as the idea of having 10G data structures in memory would have made people lol... but now so what?
ASP is interesting. There are symbolic learning techniques that learn answer set programs from data, did you know that?
>would be difficult to grasp by most working data scientists
Most of what those same people do now was difficult to grasp when they started. The next group will adopt whatever is the best tools for the problems they face.
>which implies an uphill battle for adoption
The same was true for current ML and data science. It was true for OOP. It was true for structured programming. This is always true, but tools that solve problems get adopted.
>You can do a lot with ML and deep learning without to construct a sophisticated mathematical model of your problem space.
True, but as such simple tasks are mined out, everyone doing much work will move past that. For example, pick any task on say paperswithcode.com and look at the top performing models. It's hard to find any that are simple old school networks. All of them I just sampled do much more sophisticated modeling of the precise problem space, and the trend is that all current edge and future work will involve modeling the problem much more carefully.
I expect the naive deep learning approach will be a tiny blip in the progress of using learning to solve problems.
Future progress looks like it relies more and more on making a good model of the problem with some untuned parameters, then using gradient descent or related to tune those parameters. The better the original model, the better it performs, and the less data one needs to train it since it has relevant structure already.
For example, learning basic physics in a net takes significantly more training, parameters, and cost than simply building in basic physics and tuning the small unknown parts. The net version also doesn't scale well since it is built from pieces (linear chunks) that simply don't match functional relations in the underlying problem.
> the author kindly pointed out that there are no papers published up to that point that prove deep learning (neural networks) can perform better than classical statistics.
Early in my career I moved to Silicon Valley to work for a large company. The project was a machine learning project. I was taking models defined in XML, grabbing data from a few different databases, and running it through a machine learning engine written in-house.
After a year and a half, it came out that our machine-learning-based system couldn't beat the current system that used normal statistics.
What rubbed me the wrong way was that the managers brought someone else in to run the data, manually, through the machine learning algorithm. More specifically, what bothered me was that we didn't attempt this kind of experiment early in the project. It felt like I was hired to work on a "solution in search of a problem."
Career lesson: Ask a lot of questions early in a project's life. If you're working on something that uses machine learning, ask what system it's replacing, and make sure that someone (or you) runs it manually before spending the time to automate.
>> Unfortunately for us (or at least me, because why did I spend so much time learning this stuff?), many top ML or AI researchers have only a vague understanding of fundamentals, and I vehemently disagree that you need rigorous exposure to mathematics to contribute to ML or AI.
But that depends on what you mean by "contribute". Machine learning research has turned into a circus with clowns and trained donkeys and the majority of the "contributions" suffer heavily from what John McCarthy called the "look ma, no hands disease" of AI:
Much work in AI has the ``look ma, no hands'' disease. Someone programs a computer to do something no computer has done before and writes a paper pointing out that the computer did it. The paper is not directed to the identification and study of intellectual mechanisms and often contains no coherent account of how the program works at all[1].
Yes, anyone can contribute to that kind of thing without any understanding of what they're doing. Which is exactly what's happening. You say that "many top ML or AI researchers have only a vague understanding of fundamentals" matter-of-factly but while it certainly is fact, it's a fact that should ring every single alarm bell.
The progress we saw in deep learning at the start of the decade certainly didn't come from researchers with "only a vague understanding of fundamentals"! People like Hinton, LeCun, Schmidhuber and Bengio have a deep background not only in computer science and AI but in other disciplines also (Hinton was trained in psychology, if memory serves). Why should we expect any progress to come from people with a "vague understanding" of the fundamentals of their very field of knowledge? In what historical circumstance was knowlege enriched by ignorance?
> I did use various deep learning methods as a black box for particular tasks they seemed good at. However, I never really liked doing it and at this point I'm starting to feel like a dinosaur not being able or willing to adapt to the new reality.
I think this is actually the new reality. Only few people will work on (advanced) deep learning model and the users will only adjust them to their use cases and applications.
I do understand your issue though, because I have been feeling the same about deep learning and never really had much of an application in my professional life either. It just takes too much time to get to a level where you can actually generate insights.
Besides, most people who do deep learning that I know are the first to tell you, to stick to traditional ML techniques until it's not enough anymore.
>> it's always 5 years away because mainstream AI researches are stuck with yak shaving their gradient descents.
A small correction: that's deep learning researches, not AI researchers and not all machine learning researchers even. To be charitable, it's not even all deep learning researchers. It's just that the field of deep learning research has been inundated with new entrants who are sufficiently skilled to grok the practicalities but lack understanding of AI scholarship and produce unfortunately shoddy work that does not advance the field (any field, any of the aforementioned ones).
As a personal example, my current PhD studies are in Inductive Logic Programming which is, in short, machine-learning of logic programs (you know, Prolog etc). I would not be able to publish any papers without a theoretical section with actual theoretical results (i.e. theorems and their proofs - and it better be a theorem other than "more parameters beget better accuracy", which is not really a theorem). Reviewers would just reject such a paper without second thought, regardless of how many leaderboards I beat in my empirical results section.
And of course there are all the other fields of AI were work continues - search, classical planning, constraint satisfaction, automated theorem proving, knowledge engineering and so on and so forth.
Bottom line- the shoddy scholarship you flag up does not characterise the field of AI research, as a whole, it only afflicts a majority of modern deep learning research.
> Ten+ years into the DL revolution and we are still getting shallow hit piece articles like this from IEEE.
I think the latest IEEE articles have been a little intellectually lightweight, perhaps we're just simply not their audience. Still, in the style of IEEE papers, I would like to see references for the claims made.
> But all the people throwing rocks at Deep Learning have failed to propose anything that actually works better
I was thinking something similar. DL for sure has problems, but I am yet to see something better. ART doesn't just need to be better, it needs to be better in _at least_ one of: accuracy, training time, recall time, amount of training data, quality of training data.
>I think some researchers are refusing to accept the idea that machine learning is very much an experimental science today, and the (very cool) mathematics of kernels, SVMs, empirical risk minimization, bayesian statistics, etc. are simply no longer useful in the large scale regime.
The same statement could have been said of neural networks for a decades, but researchers poking at corners eventually found methods to turn them into the useful tools they are now. After all, those methods you downplay were created since neural networks were not useful at that time, whereas many of these were.
I'd not poo poo what researchers decide to poke at. Pretty much every breakthrough is people poking at the edges of understanding. If solutions or steps were straightforward, then it would be engineering, not research.
>My prediction is that t
Mine is that neural networks as we understand them now get replaced by much more solid methods, based on the principles from scientific machine learning, where sophisticated differentiable models that are designed to mimic the problem space get tuned. After all, even current neural networks are heading that direction. Neural networks are simply too simplistic to capture lots of the complexity that problems demand (hence the current move past them in many domains).
Gluing linear functions together ad-hoc is simply a low level approximation to what can be developed using centuries of powerful mathematics to make models.
> I'm getting kind of sick of this "deep learning is a black box" trope, because it's really not true anymore.
That's fair/probably true.
I think there's two things that drive that--one, lack of a widely shared deep understanding of the field[0] (and not really needing a deep understanding to get good results--as both you and the author pointed out), and two, the fact that it feels like cheating, compared to the old ways of doing things. :P
[0] When the advice on getting a basic understanding is "read a textbook, then read the last 5 years of papers so that you aren't hopelessly behind", there just isn't going to be widespread understanding.
> I wish machine learning research didn't respond so strongly to trends and hype,
It's really because nobody actually understands what's going on inside a ML algorithm. When you give it a ginormous dataset, what data is it really using to make its determination of
Because what I do for ML is do a supervised fit, then use a next to test and confirm fitness, then unleash it on untrained data and check and see. But I have no real understanding of what those numbers actually represent. I mean, does .91128756789 represent the curve around the nose, or is it skin color, or is it a facial encoding of 3d shape?
> I'm still wondering what, if anything, is going to supplant deep learning.
I think it'll be a slow climb to actual understanding. Right now, we have object identifiers in NN. They work, after TB's of images and PFLOPS of cpu/gpu time. It's only brute force with 'magic black boxes' - and that provides results but no understanding. The next steps are actually deciphering what the understanding is, or making straight-up algorithms that can differentiate between things.
> I want us, as a community, to stop treating deep learning any different than any other ML algorithms --- have a consensus, based on scientific facts, about the possibilities and limitations thus-far. If we, "the experts", don't understand these things about our own algorithms, how can we the rest of the world to understand them?
I agree. It's interesting watching the "debate" around deep learning. All the empirical results are available for free online, yet there's so much misinformation and confusion. If you're familiar with the work, you can fill in the blanks on where things are headed. For instance, in 2011, I think it became clear that RNNs were going to become a big thing, based on work from Ilya Sutskever and James Martens. Ilya was then doing his PhD and is now running OpenAI, doing research backed by a billion dollars.
The pace of change in deep learning is accelerating. It used to be fairly easy for me to stay current with new papers that were coming out; now I have a backlog. To a certain extent, it doesn't matter what other people think, much of the debate is just noise. I don't know what AGI is. If it's passing the turing test, we're pretty close, 3 years max, maybe by the end of the year. Anything more than that is too metaphysical and prone to interpretation. But there have been a bunch of benchmark datasets/tasks established now. Imagenet was the first one that everyone heard about I think, but sets like COCO, 1B words, and others have come out since then and established benchmarks. Those benchmarks will keep improving, pursuing those improvements will lead to new discoveries re: "intelligence as computation", and something approximately based on "deep learning" will drive it for a while.
>What is new is the hype around deep learning that took off after 2012, and because Google and Facebook decided to champion it.
Have you seen what neural nets are now capable of? Speech synthesis/transcription, voice synthesis, image synthesis/labeling/infill, style transfer, music synthesis, and a host of other classes of optimization problems which have intractable explicit programmitic solutions.
The hype is justified, because ML has finally arrived, thanks primarily to hardware, and secondarily to the wealth of modern open research, heavily influenced congregations of leading researchers enabled by funding at Google, Facebook, etc.
The problems being solved by "curve fitting" ML were simply unsolvable by any practical, generalizable means before recently, and the revolution is just getting started.
> Deep learning systems require too much data (and often not raw data), crazy amounts of compute, and still fail under comically irrelevant tweaks of the problem (eg: adversarial noise).
Many of these criticisms generalize to 'first make it work, then make it better, then make it fast'.
Being able to solve some of these problems at all is the kicker, whether or not it is efficient is at the moment not nearly as important as being able to solve them in principle.
I suspect that in the very near future we will see something that extracts the actual insights required from a DL model that makes it work which then can be used to power an analytical solution that performs a lot faster than the model itself, and hopefully with more resistance against noise.
I've already seen several glimpses of this and I'm hoping for some kind of breakthrough where DL is used to continue where feature engineering left off.
>The same goes for data. We can outperform VGGNet trained on all of imagenet (~ 15 million images) using resnet trained on a 1% random sampling of imagenet. We have papers putting forth statistics on specific model architectures outperforming with < 0.1% of data of benchmarks and just-barely-worse performance using a decent, but fixed and tiny number of data.
okay granted, but isn't this largely part of the academic field, openly accessible and transferable? People at Baidu and Tencent are likely up to date on the state of the art.
Contrast this for example with the American military edge or the economics of the German Mittelstand. There's a degree of ingrained knowledge that gives countries decades of advance, but academic research can often be transferred. Understanding in ML to me seems to fall mostly in the second category, and when that category dominates then data and policy really does become relevant.
I'm not sure we have seen an ML intellectual arms race yet. For the most part techniques seem to spread fast and as a result data functions like oil of sorts and gives the biggest players the largest advantage.
You're the one who brought aesthetics into this, not me. I just want to know how things work, and why. That's what science means, to me.
And the process we both, I think, understand is happening in machine learning research doesn't help anyone understand anything. One team publishes a paper claiming this one neat architectural trick beats such-and-such systems on such-and-such benchmarks. Three weeks later another team beats the former team on the same benchmark with another neat trick.
All we learn is that some people came up with some neat tricks to beat benchmarks, and that you can't reuse the same trick twice. So what's the point of the trick, then, if you have to do it all from scratch, and you never know how to make something work?
Neural nets work only when they work. When they don't work, nobody will tell you. Teams who beat the SOTA will gloat about it and keep schtum about all their many failures along the way. Their one neat trick can't beat _this_ benchmark? Nobody will ever know.
The sum of knowledge generated by this process is exactly 0. That is why I say that progress is not being made. Not because it looks ugly. I don't care about elegant or ugly. I don't know why you assumed I do. I care about knowledge, that's my schtick. When there's a few dozen thousand people putting papers on arxiv that tell us nothing at all, that's a process that generates not knowledge, but noise.
>> DeepL today translates to and from Hungarian much much better than the best systems 7 years ago.
Based on what? BLEU scores?
>> The bitter lesson is called bitter for a reason. It leads elegance-seekers to despair and they downplay the results. They rather shift goalposts and deny that the thing that has landed really is a plane.
That's a rotten way to react to criticism: assume an emotional state in the critic and dismiss them as being dishonest. If you can easily dismiss the criticism by pointing out its flaws, do it, but if all you have to say is "nyah nyah you're butthurt" then you haven't dismissed anything.
reply