Hacker Read

swalsh · 2023-07-11 06:33:43

10x the parameters? Maybe not in a single model, but maybe 10x the expert models has 10x the value. I'm sure there are diminishing returns eventually, but we're probably not close to that.

hubraumhugo | karma 7314 | avg karma 9.77 · | 2024-03-17 19:46:44

When will we reach an upper limit/dimishing returns in terms of number of parameters and mixture of experts?

modeless | karma 36822 | avg karma 6.69 · | 2021-02-14 17:55:52

I don't care how many parameters my model has per se. What I care about is how expensive it is to train in time and dollars. If this makes it cheaper to train better models despite more parameters, that's still a win.

Silverback_VII | karma 60 | avg karma 0.2 · | 2023-05-03 07:34:39

I'm not sure whether the number of parameters serves as a reliable measure of quality. I believe that these models have a lot of redundant computation and could be a lot smaller without losing quality.

endisneigh | karma 11792 | avg karma 2.65 · | 2023-03-23 18:25:17

that's a complicated question to answer. what I'd say is that more parameters makes the model more robust, but there are diminishing returns. optimizations are under way

aoeuasdf1 | karma 140 | avg karma 1.54 · | 2015-07-22 18:09:52+00:00

That's not sufficient. If you write 10 different models, each with 10s of thousands of parameters until you get the results you want, it doesn't matter how accurately the model seems to predict in the past. Modelling is a tricky business that easily falls prey to such "data snooping" methods.

immichaelwang | karma 92 | avg karma 3.41 · | 2023-05-22 12:52:39

Isn't there more and more research coming out that at a certain point (200B~), parameters have significantly decreasing returns and it's better to just then do some supervised learning ontop of the base model?

CuriouslyC | karma 5185 | avg karma 1.89 · | 2023-05-22 13:06:14

Parameters don't have diminishing returns so much as we don't have enough (distinct) data to train models to use that many parameters efficiently.

Buttons840 | karma 11523 | avg karma 3.97 · | 2023-04-14 21:15:36

In my machine learning experience, if it only takes 10x the parameters brings a significant improvement I feel lucky.

danharaj | karma 5234 | avg karma 2.31 · | 2020-02-10 21:48:47+00:00

If your model has 17 billion parameters, you missed some.

swalsh | karma 10917 | avg karma 4.37 · | 2024-04-19 17:19:17

Parameter count seems to only matter for range of skills, but these smaller models can be tuned to be more than competitive with far larger models.

I suspect the future is going to be owned by lots of smaller more specific models, possibly trained by much larger models.

These smaller models have the advantage of faster and cheaper inference.

reply

kiratp | karma 603 | avg karma 3.41 · | 2023-12-13 00:36:02

It’s not reasonably possible (currently?) to get the same performance from a 7 billion parameter model as a 175 billion parameter model with just an additional 6000 lines of finetuning data.

VHRanger | karma 5868 | avg karma 4.89 · | 2023-05-05 06:32:30

Yes, because it turns out if you have a more reasonable number of parameters, but train for longer, the outcome is a more efficient model

yieldcrv | karma 3087 | avg karma 0.64 · | 2023-09-22 14:46:48

I think there will be diminishing utility of “smarter” models

People won’t need them

They’ll need more efficient models: smaller parameter count, faster output, thats just as smart as some current baseline benchmark

reply

jxramos | karma 2369 | avg karma 1.47 · | 2019-12-31 21:19:15

Maybe but the point is they'll need to know vastly less data than the smorgasbord of hyper-parameters in use today.

jiggawatts | karma 26854 | avg karma 5.63 · | 2024-01-09 22:00:33

It has more parameters, but not all of them are used during inference. They compared models that use equal numbers of parameters.

andrewprock | karma 534 | avg karma 0.89 · | 2021-03-13 18:03:39+00:00

How much data do you need to mitigate the risk of over fitting a trillion parameter model?

ein0p | karma 1788 | avg karma 1.48 · | 2024-03-18 00:21:12

Doubtful, for purely information theoretic and memory capacity reasons. It may outperform on some synthetic metrics, but in practice, to a human, larger models just feel “smarter” because they have a lot more density in their long tail where metrics never go

CuriouslyC | karma 5185 | avg karma 1.89 · | 2024-05-31 11:32:51

Yes and no. We don't need an insane amount of data to make these models accurate, if you have a small set of data that includes the benchmark questions they'll be "quite accurate" under examination.

The problem is not the amount of data, it's the quality of the data, full stop. Beyond that, there's something called the "No Free Lunch Theorem" that says that a fixed parameter model can't be good at everything, so trying to make a model smarter at one thing is going to make it dumber at another thing.

We'd be much better off training smaller models for specific domains and training an agent that can use tools deepmind style.

reply

onlyrealcuzzo | karma 13783 | avg karma 3.37 · | 2019-12-07 11:11:35

The algorithms probably aren't that great, and more of them would likely have diminishing returns. Adding substantially more false positives could actually be a bad thing.

As someone else mentioned. The 35k parameters is skeptical. Taleb and Tversky and Kahneman have good evidence that most algorithms are better with less parameters. The more parameters, the more noise.

reply