Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

The point is that iteration is before training. You don't iterate post starting the training and presumably this person doesn't want to train the model themselves. If they do, they can do that but they need to get the training data and a huge amount of GPUs.


sort by: page size:

There's absolutely state space iteration in model training. Layer sizes, composition, construction.

> There was no real iteration, you set it up and wait for the GPUs to do the work. The model architecture itself is done before you start training. If you change it, you generally have to start training from scratch.

That's like saying there's no design iteration in software because you type 'make' and the executable is built.


> Would be weird to ask the user to do this

Alternatives to training include downloading the trained model. It could be a big download but an overnight download is not that weird to ask users to do.


Correct, working on enabling training on next iterations

I am just learning but I am guessing the issue is training is expensive, because you need to iterate many times, and do a massive bunch of matrix operations (and other operations) per iteration?

Maybe too expensive to "keep up" with new data coming in? Does a single new piece of data mean you need to start from scratch, or are you "near enough" and so it needs less processing power to incorporate it?


And also... it's not like the program actually contains a copy of the training data, right? The training data is a tool which is used to build a model.

To me the article is missing the point between training a model and running a model.

this is a bad joke, at best you train a model from the output, else its a race to the bottom.

No, it's "load pretrained resnet and finetune on a few examples". Nobody trains from scratch today except the researchers with large budgets.

For most purposes you don’t need to train from scratch. Instead you fine-tune an existing model, on smaller amounts of data and for a fraction of the time.

This is akin to teaching an adult human about a specific domain. Better to just do that than make a whole new human from scratch!


The time and expense of training a model at this size does not benefit well from trial and error. It's simply impractical to iteratively try ~20 different learning schedules.

Hideously ineficient and hacky to have someone manually tweaking things, but not terribly different from the state of the art for scientific research. As long as they state the objectives of their manual control and produce a log of what they did someone else could try to replicate it.


This isn't for training, is it? It's for using the results of training immediately (inferring something), without the need for a network round trip (as far as I understand it).

So, you might still send the request to the network to continue training the model, but by the time you do, your answer has already computed on the local machine for local consumption.


> How does the model know when a human has to take over?

It’s incredibly easy, you ask “did this answer solve your issue?” and add a max_tries.

> … how do you know how much training data to generate …?

You don’t, you keep doing it until the results improve to meet your goals, or they stop short and you switch tactics.


I want to see something made by the model when we're done training it. And if you are not doing ML with all those user inputs, I'm extremely disappointed.

>> Acceleration is needed for training -- not running the models themselves.

Sometimes training is done on the customer site. For example a noise cancellation algorithm may learn audio characteristics of the user's environment to offer better performance.


Well, that's why you checkpoint a lot and feed in the new training data a little at a time, rather than dumping in a massive slug all at once.

Right?


It is a fair point though - there's no utility in training an openly available model from scratch. Finetuning is far more practical.

> who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight.

That's true even if you can download the whole model. It's not like we can figure out what it's doing from looking at the weights. Training the model locally might avoid intentional bias, but that's what takes a huge GPU farm.


Because sometimes you don’t want to write your own training loops, you just want a working method to train a model.

>All without having been explicitly trained in those tasks.

Can you elaborate on what kind of training data was used here? I'm curious.

next

Legal | privacy