Hacker Read

spullara · 2023-09-27 18:41:49

The point is that iteration is before training. You don't iterate post starting the training and presumably this person doesn't want to train the model themselves. If they do, they can do that but they need to get the training data and a huge amount of GPUs.

monocasa | karma 27236 | avg karma 2.94 · | 2023-09-27 13:49:43

There's absolutely state space iteration in model training. Layer sizes, composition, construction.

> There was no real iteration, you set it up and wait for the GPUs to do the work. The model architecture itself is done before you start training. If you change it, you generally have to start training from scratch.

That's like saying there's no design iteration in software because you type 'make' and the executable is built.

reply

wyldfire | karma 20828 | avg karma 4.03 · | 2017-03-28 00:57:28+00:00

> Would be weird to ask the user to do this

Alternatives to training include downloading the trained model. It could be a big download but an overnight download is not that weird to ask users to do.

reply

adifrancescoTT | karma 4 | avg karma 1.0 · | 2024-02-05 11:52:28

Correct, working on enabling training on next iterations

quickthrower2 | karma 24182 | avg karma 1.71 · | 2023-04-14 03:00:51

I am just learning but I am guessing the issue is training is expensive, because you need to iterate many times, and do a massive bunch of matrix operations (and other operations) per iteration?

Maybe too expensive to "keep up" with new data coming in? Does a single new piece of data mean you need to start from scratch, or are you "near enough" and so it needs less processing power to incorporate it?

reply

Wowfunhappy | karma 27196 | avg karma 3.43 · | 2021-05-21 21:07:43+00:00

And also... it's not like the program actually contains a copy of the training data, right? The training data is a tool which is used to build a model.

sharemywin | karma 5432 | avg karma 0.74 · | 2019-11-19 04:33:10+00:00

To me the article is missing the point between training a model and running a model.

amrb | karma 382 | avg karma 1.4 · | 2023-03-30 19:51:04

this is a bad joke, at best you train a model from the output, else its a race to the bottom.

visarga | karma 12425 | avg karma 1.65 · | 2021-10-08 00:50:16

No, it's "load pretrained resnet and finetune on a few examples". Nobody trains from scratch today except the researchers with large budgets.

slewis | karma 1377 | avg karma 4.7 · | 2021-11-07 17:54:52

For most purposes you don’t need to train from scratch. Instead you fine-tune an existing model, on smaller amounts of data and for a fraction of the time.

This is akin to teaching an adult human about a specific domain. Better to just do that than make a whole new human from scratch!

reply

lumost | karma 9692 | avg karma 3.34 · | 2022-05-04 16:44:31

The time and expense of training a model at this size does not benefit well from trial and error. It's simply impractical to iteratively try ~20 different learning schedules.

Hideously ineficient and hacky to have someone manually tweaking things, but not terribly different from the state of the art for scientific research. As long as they state the objectives of their manual control and produce a log of what they did someone else could try to replicate it.

reply

jessep | karma 1790 | avg karma 7.82 · | 2016-01-11 20:14:05+00:00

This isn't for training, is it? It's for using the results of training immediately (inferring something), without the need for a network round trip (as far as I understand it).

So, you might still send the request to the network to continue training the model, but by the time you do, your answer has already computed on the local machine for local consumption.

reply

svnt | karma 2530 | avg karma 2.22 · | 2022-11-17 04:20:45

> How does the model know when a human has to take over?

It’s incredibly easy, you ask “did this answer solve your issue?” and add a max_tries.

> … how do you know how much training data to generate …?

You don’t, you keep doing it until the results improve to meet your goals, or they stop short and you switch tactics.

reply

comboy | karma 5631 | avg karma 3.54 · | 2020-06-17 09:16:37

I want to see something made by the model when we're done training it. And if you are not doing ML with all those user inputs, I'm extremely disappointed.

petra | karma 3752 | avg karma 1.34 · | 2016-04-28 21:01:49+00:00

>> Acceleration is needed for training -- not running the models themselves.

Sometimes training is done on the customer site. For example a noise cancellation algorithm may learn audio characteristics of the user's environment to offer better performance.

reply

Turing_Machine | karma 10789 | avg karma 1.77 · | 2023-05-19 10:14:20

Well, that's why you checkpoint a lot and feed in the new training data a little at a time, rather than dumping in a massive slug all at once.

Right?

reply

manimino | karma 514 | avg karma 5.04 · | 2023-01-25 17:49:48

It is a fair point though - there's no utility in training an openly available model from scratch. Finetuning is far more practical.

DennisP | karma 18078 | avg karma 2.66 · | 2023-03-12 19:55:03

> who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight.

That's true even if you can download the whole model. It's not like we can figure out what it's doing from looking at the weights. Training the model locally might avoid intentional bias, but that's what takes a huge GPU farm.

reply

pilotneko | karma 145 | avg karma 2.23 · | 2023-07-11 12:17:21

Because sometimes you don’t want to write your own training loops, you just want a working method to train a model.

BenoitEssiambre | karma 3640 | avg karma 4.59 · | 2020-07-20 15:31:24+00:00

>All without having been explicitly trained in those tasks.

Can you elaborate on what kind of training data was used here? I'm curious.

reply