"In literally every domain neural networks have been utilized in you reach asymptotic level diminishing returns."
Is that true, though? I think of "grokking", where long training runs result in huge shifts in generalization, only with orders of magnitude more training after training error seemed to be asymptotically low.
This'd suggest both that there's not that asymptotic limit you refer to - something very important is happening much later - and that there are potentially some important paths to generalization on lower amounts of training data that we haven't yet figured out.
Is that true, though? I think of "grokking", where long training runs result in huge shifts in generalization, only with orders of magnitude more training after training error seemed to be asymptotically low.
This'd suggest both that there's not that asymptotic limit you refer to - something very important is happening much later - and that there are potentially some important paths to generalization on lower amounts of training data that we haven't yet figured out.
reply