Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> the only ones making actual breakthrough for money reasons

That's also a suggestion that those aren't breakthroughs. They are just someone getting 1% because their corporate sponsor spent $250k more than the other guys.

Look at the ideas, not the results. Is there something new and is it clearly expressed? If you can't answer that in five minutes, move on. Ideas transfer, results don't.

In particular, if an ML paper abstract states a percentage improvement over SOTA and then lists five existing techniques that were combined to get the result, you can just put it directly on the trash pile.



sort by: page size:

> The improvements are usually in the 1% range, from old models to new models, and the models are complex. More often than not also lacking code, implementation / experiment procedures, etc. [...] no idea if the paper is reproducible, if the results are cherry picked from hundreds / thousands of runs, if the paper is just cleverly disguised BS with pumped up numbers to get grants, and so on.

Personally, I'm bearish about most deep learning papers for this reason.

I'm not driven by a particular task/problem, so when I'm reading ML papers, it is primarily for new insights and ideas. Correspondingly, I prefer to read papers which have new perspectives on the problem (irrespective of whether they achieve SOTA performance). From what I've seen, most of the interesting (to me) ideas come from slightly adjacent fields. I care far more about interesting & elegant ideas, and benchmarks to just sanity-check that the nice idea can also be made to work in practice.

As for the obsession with benchmark numbers, I can only quote Mark Twain: “Most people use statistics like a drunk man uses a lamppost; more for support than illumination.”


> recently got a lot of traction

This flips causality - funding didn't really really pick up until the last 5 years or so - about two years after the initial breakthrough.

Look, I agree with this comment - managers are super eager to apply ML to tasks it has no business being applied to (at least yet). But your original claim was that all the new techniques are just glorified maximum likelihood optimization. That's just false.


> And I don't blame them- it could impact funding, access to journals, etc.

No, being "understanding" about this makes you part of the problem.

Part of research is trying new things, and exploring, but you should always be checking this against simpler techniques and currently accepted best performance. If it isn't an improvement, you don't publish at all unless you have something interesting to say about why it isn't improving things. Or perhaps in a methods survey.

One thing that happens when a technique like deep learning gets popular and things like python toolkits pop up making it easy to try, is that you get researchers in adjacent areas who don't really understand what they are doing trying to apply it to their favorite problem domain, and then comparing almost exclusively to other naive groups doing the same thing. It can be hard to tell if there is anything interesting there, even when there are dozens of conference papers, etc.

Basically the same thing happened with kernel methods when SVMs were hot, to a smaller scale.

Compare this to say, AlexNet. The reason that was immediately obvious something interesting was happening there was the fact that lots of people who did know what they were doing in models, had tried lots of other approaches, and you could make direct comparisons.

So yes, blame them. I do think negative results should be valued higher, but the fact you did some work doesn't make it publishable.

Frame it another way, if you give me a paper proposing a complex model and I grab your data an play with it a bit and it turns out linear regression with appropriate pre treatment works better ... well then I'm forced to believe you are either incompetent or being dishonest. Or, if students, your supervisors are.

This generalizes well. You should always be comparing against a known (or considered) good solution on your data under the same conditions, not just comparing against last years conference paper and variants you are trying to "improve". The right choice of baseline comparison will depend a bit on the domain, but not including it at all is shockingly poor scholarship.

I've even seen paper submissions with no direct comparisons at all, because the "researchers" didn't have access to the comparators data, and were too lazy to implement the other methods. Which leads to another sloppiness - methods that get pulled into comparison not because they are the right choice, but because there is a publicly available implementation. In the best case this forms a useful baseline. In the worst case, well, I guess it's good for your citation count if you implemented it :)


>> Again I share the aesthetics distaste to how this progress looks. But one has to get over such personal tastes.

You're the one who brought aesthetics into this, not me. I just want to know how things work, and why. That's what science means, to me.

And the process we both, I think, understand is happening in machine learning research doesn't help anyone understand anything. One team publishes a paper claiming this one neat architectural trick beats such-and-such systems on such-and-such benchmarks. Three weeks later another team beats the former team on the same benchmark with another neat trick.

All we learn is that some people came up with some neat tricks to beat benchmarks, and that you can't reuse the same trick twice. So what's the point of the trick, then, if you have to do it all from scratch, and you never know how to make something work?

Neural nets work only when they work. When they don't work, nobody will tell you. Teams who beat the SOTA will gloat about it and keep schtum about all their many failures along the way. Their one neat trick can't beat _this_ benchmark? Nobody will ever know.

The sum of knowledge generated by this process is exactly 0. That is why I say that progress is not being made. Not because it looks ugly. I don't care about elegant or ugly. I don't know why you assumed I do. I care about knowledge, that's my schtick. When there's a few dozen thousand people putting papers on arxiv that tell us nothing at all, that's a process that generates not knowledge, but noise.

>> DeepL today translates to and from Hungarian much much better than the best systems 7 years ago.

Based on what? BLEU scores?

>> The bitter lesson is called bitter for a reason. It leads elegance-seekers to despair and they downplay the results. They rather shift goalposts and deny that the thing that has landed really is a plane.

That's a rotten way to react to criticism: assume an emotional state in the critic and dismiss them as being dishonest. If you can easily dismiss the criticism by pointing out its flaws, do it, but if all you have to say is "nyah nyah you're butthurt" then you haven't dismissed anything.


> They probably are strongly motivated by spending quite a bit of their time on innovation

I would rephrase it as spending time on research. Which might or might not be innovative.

I know for a fact that there is a plethora of applied ML papers where people simply toy around with the sample set and publish their results. There is nothing innovative about it. Although it requires some knowledge in feature analysis and some other in related algorithms and in basic statistics but that's nothing that cannot be picked up by taking a few months of courses in applied machine learning.

So really, in fact the only case where that extra knowledge is justified is your case #1. And even then I am not convinced that these PhDs actually have ever done anything innovative. There is just too much of a chance that they simply iterated on existing research ideas in their field and they can only continue to do so. Which is again not very innovative I think.


> Academics merely need to show a new technique with the potential to improve the SOTA.

Not sure which field you are talking about, but I've had more than one machine learning paper rejected because it didn't improve SOTA enough. Not to contradict your point directly, but current publication practices don't seem to scale very well with the progress in AI


> Companies / researchers in general have no strong requirement to show you any artifacts to reproduce their work.

That's quite literally the opposite of what science is all about! If they don't want others to reproduce their results, they might just as well end each paper with "You can take our word for it!" and skip the details altogether...

What I'm rather interested in are points of comparison. Performance in terms of a chosen metric is one thing, but research gets more useful if it can easily be reproduced. This is the norm in all other sciences - why not in AI research?

If I can see that their approach is 20% better than SOTA, but they require 1M LoC plus 3 weeks of total computation time on a 100 machine cluster with 8 V100 per node, I can safely say - sod it! - use the inferior commercial product instead and add 20% manual effort (since I need to add manual work anyways as the accuracy isn't 100%).


> Any place doing serious ML will require the person to have a PhD or have publications and presentations at conferences like NIPS/ICML.

Stop spouting this bullshit. You don't need a PhD, and you don't need to advance the field to be doing 'serious ML'. All you need to be able to do is know how and when to apply it to solve crucial business problems.


>There has been next to ZERO progress towards genuine AGI

I mean, really? In the most pessimistic evaluation of ML research, we still know which techniques and paradigms won't work for AGI. That's not zero progress. Nobody is expecting this to happen overnight.


>"But this is yet to be shown; for some reason authors decide not to "compete" on a same ground. Instead of "promoting" their methods in scientific community via publications / comparisions with existing approaches they seem to focus on people who have little knowledge of modern machine learning."

You'd have to provide them then with a decent argument about what economic/competitive benefit they would get by what you suggest. You say they're commercial companies, so then don't be surprised when their approach is based on financial incentives. But trust me, they're probably begging to have someone show them a better alternative that will give them a competitive edge.

So, either no one like you has given them that alternative/idea. Someone has already, and they rejected the financial benefit. Or, finally, someone already told them your idea but they discovered there was no financial benefit and the only benefit was for the greater society.


> My experience in CS is that the replicability of experimental results is embarrassingly bad

I think that's true for low-tier/low-impact CS papers, but the difference is that literally nobody cares about those papers. The high-tier stuff is easily verifiable (like Tensorflow or whatever) and nobody is writing articles about 5% improvements against a benchmark in some obscure niche scheduling and planning domain.

Outside academic CS people are more scientific about experimental results because it has concrete implications on revenue or spend... but they aren't getting the results from conference papers.


>> 1. Cutting-edge academic research (do better on this test set)

It's interesting you put it this way. I think most machine learning researchers who aspire to do "cutting-edge" research would prefer to be the first one to do well on a new dataset, rather than push the needle forward by 0.005 on an old dataset that everyone else has already had a ball with. Or at the very least, they'd prefer to do significantly better than everyone else on that old dataset.

I bet you remember the names of the guys who pushed the accuracy on ImageNet up by ~11%, but not the names of the few thousand people who have since improved results by tiny little amounts.


> In Machine Learning, the standard expectation of research papers is to empirically demonstrate the increased statistical power of the model over benchmarks, and I wish the authors demonstrated that their model aligns with historical reality.

I went to graduate school in econometrics after taking several undergraduate courses in ML, and I was constantly astounded how paper after paper failed to do this. I don't think you can (or should) claim a result if you can't demonstrate that it works empirically.


> If I had a penny for every dissertation in the past few years that boiled down to...

This is very, very accurate. On the other hand, I oftentimes see field-specific papers from field experts with little ML experience using very basic and unnecessary ML techniques, which are then blown out of the water when serious DL researchers give the problem a shot.

One field that comes to mind where I have really noticed this problem is genomics.


> Does choosing boring ML tech mean sitting out on massive recent advancements?

that's the point that for new tech it is hard to see where is the hype and where are real advancements which can be useful, and author's choice not to take the risk.

Say you bought the hype of GPT, and integrated it to your product to unlock new cases, but it also can add lots of confusion and quality issues, which author prefers to avoid.


> a lot of machine learning endeavors that aren’t generating much value.

I agree with this. From what I have seen, execs want something fancy, when you could give a boring-ish tool that reduces your OODA loop cycle time to 30% of what it was using unsexy techniques. Much of it having to do with tolerancing of answers, being within 5-10% is more than enough to drive the business, but someone somewhere said it had to be exact and that blows out the latency budget.

When engaging in consulting gigs, it is super important to know what kind of org you dealing with before you get involved. The myopic penny pinching orgs should be steered well away from, which I think was your point.


> You appear to be denying the progress made over the past 30 years by deep learning, ML frameworks, constraint solvers, and immense computing power.

Most of the progress in the last 30 years was immense computing power, almost all foundations for todays ML are revised old concepts. What you propose is AGI, how you want to achieve that? We don't even know where to start in theory, this is not my opinion but current top names in ML world[1], which was discusses on HN many times.

1. https://venturebeat.com/2018/12/17/geoffrey-hinton-and-demis...


>At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.

I'm sorry but given that many papers in NeurIPS, ICML, etc are exactly what you described I find your criticism a bit lacking.


>For all the data that they collect and all the AI that they pay for, these companies get very little revenue to show for it.

Not really, they make a shit ton of money from their recommendations and perform extensive A/B testing to keep track of how much money it makes them. Anecdotes aren't data and large tech companies don't spend money for no reason (especially when they can A/B test the impact trivially). Remember that at their scales even a 0.01% increase in revenue is worth $10+ million per year so they don't need to be perfect to make a shit ton of money. There's a reason ML engineers get paid $1+ million and it's not corporate stupidity.

next

Legal | privacy