The point is that iteration is before training. You don't iterate post starting the training and presumably this person doesn't want to train the model themselves. If they do, they can do that but they need to get the training data and a huge amount of GPUs.
There's absolutely state space iteration in model training. Layer sizes, composition, construction.
> There was no real iteration, you set it up and wait for the GPUs to do the work. The model architecture itself is done before you start training. If you change it, you generally have to start training from scratch.
That's like saying there's no design iteration in software because you type 'make' and the executable is built.
Alternatives to training include downloading the trained model. It could be a big download but an overnight download is not that weird to ask users to do.
I am just learning but I am guessing the issue is training is expensive, because you need to iterate many times, and do a massive bunch of matrix operations (and other operations) per iteration?
Maybe too expensive to "keep up" with new data coming in? Does a single new piece of data mean you need to start from scratch, or are you "near enough" and so it needs less processing power to incorporate it?
And also... it's not like the program actually contains a copy of the training data, right? The training data is a tool which is used to build a model.
For most purposes you don’t need to train from scratch. Instead you fine-tune an existing model, on smaller amounts of data and for a fraction of the time.
This is akin to teaching an adult human about a specific domain. Better to just do that than make a whole new human from scratch!
The time and expense of training a model at this size does not benefit well from trial and error. It's simply impractical to iteratively try ~20 different learning schedules.
Hideously ineficient and hacky to have someone manually tweaking things, but not terribly different from the state of the art for scientific research. As long as they state the objectives of their manual control and produce a log of what they did someone else could try to replicate it.
This isn't for training, is it? It's for using the results of training immediately (inferring something), without the need for a network round trip (as far as I understand it).
So, you might still send the request to the network to continue training the model, but by the time you do, your answer has already computed on the local machine for local consumption.
I want to see something made by the model when we're done training it. And if you are not doing ML with all those user inputs, I'm extremely disappointed.
>> Acceleration is needed for training -- not running the models themselves.
Sometimes training is done on the customer site. For example a noise cancellation algorithm may learn audio characteristics of the user's environment to offer better performance.
> who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight.
That's true even if you can download the whole model. It's not like we can figure out what it's doing from looking at the weights. Training the model locally might avoid intentional bias, but that's what takes a huge GPU farm.
reply