Hacker Read

clairity · 2020-02-12 18:29:21

note that "gradient descent" isn't AI either. it's more computational linear algebra: a heuristic for numerical methods used to solve (usually to a local extrema) systems of equations without direct analytical solution(s).

darpa_escapee | karma 1683 | avg karma 1.96 · | 2019-07-13 22:51:33

AI is gradient descent?

mjfl | karma 3951 | avg karma 3.12 · | 2019-02-03 02:31:21+00:00

isn't it just gradient descent?

jofer | karma 3770 | avg karma 6.5 · | 2016-08-05 15:05:56

Gradient descent is for non-linear problems where you can't directly invert or somehow linearize the problem.

Iv | karma 9755 | avg karma 4.1 · | 2020-12-06 16:02:00+00:00

Thing is, gradient descent is not really a complex algorithm.

nilkn | karma 11058 | avg karma 4.48 · | 2019-03-07 18:07:29+00:00

This is not abstract math, and the article does explain what it's doing before presenting the code snippets.

How can you explain or implement gradient descent without math? At some point I think you have to accept that this is a topic that involves math, and you're way better off understanding it on those terms rather than trying to avoid it.

reply

umvi | karma 17738 | avg karma 4.46 · | 2020-10-13 23:08:11+00:00

I thought gradient descent was mostly calculus, not linear algebra. I was under the impression linear algebra was used to frame calculations so that GPUs could be utilized (since GPUs are very good at LA operations)

TeMPOraL | karma 106045 | avg karma 3.04 · | 2023-10-26 05:41:29

Or gradient descent if you mentally negate the number in question. It's the same thing.

sumitgt | karma 574 | avg karma 3.1 · | 2017-09-01 15:56:52+00:00

I'm not an AI expert either, but let me give this a try.

I assume you are vaguely familiar with gradient descent. In gradient descent, we are basically trying to find the sweet spot where the value of a function is minimized. We do this by calculating the derivative of the function at a certain point and then use it to take small steps in the direction where we believe the function will have a lower value.

Gradient descent usually suffers from a problem where the algorithm gets stuck in local minimas if the function is not convex in shape.

However, when people use gradient descent to optimize functions with a very large number of parameters (as is the case in Deep Learning), another problem surfaces called saddle points. Imagine a 3 dimensional plot of the function at different values of its parameters (in reality the plot will be multi-dimensional). Now on this plot, there will be many regions where the derivative of the components defining the surface become zero. This messes with our plan to use derivatives to find the direction in which to move. So we need to come up with strategies to escape saddle points during the gradient descent process.

reply

mvanaltvorst | karma 361 | avg karma 3.68 · | 2021-01-09 12:15:44+00:00

Gradient descent is a subset of survival of the fittest, described by Darwin in 1800-1900, and has been in applied in computer science since the 70's. An AGI will probably use some form of gradient descent during its training, yes, but I wouldn't argue that this has brought us even close to an AGI.

rocqua | karma 9129 | avg karma 2.16 · | 2018-02-12 23:01:08

I've considered gradient descent for optimizing parameters on toy problems at university a few times. Never actually did it though, it's a lot of hassle for the advantage of less interaction at the cost of no longer building some intuition.

GFK_of_xmaspast | karma 3102 | avg karma 0.89 · | 2016-05-25 14:20:02+00:00

I know what gradient descent is, thanks, I was referring to the rest of that mess.

mav3r1ck | karma 301 | avg karma 5.79 · | 2018-03-22 16:58:21+00:00

You are not off base at all, thanks for clarify and sorry for the confusion, I did not mean to say it was using gradient descent. It's been a while. The term I was thinking of was multiple "simulated annealing".

kk58 | karma 129 | avg karma 1.28 · | 2019-02-09 15:51:37+00:00

AI nowadays is curve fitting using gradient descent :p

modeless | karma 36822 | avg karma 6.69 · | 2021-12-10 18:53:30

So it doesn't use a neural network, but it is still optimized by gradient descent. Differentiability is the key!

pmarreck | karma 4903 | avg karma 1.11 · | 2016-04-30 18:14:44+00:00

FYI, gradient descent is covered in one of the very first weeks of Andrew Ng's Coursera machine learning class, so perhaps just watch those lessons (free)

Gradient descent is the approximation solution basically because getting the exact solution requires a good computation of inverse matrices which is apparently not yet doable (it's too slow)

reply

visarga | karma 12425 | avg karma 1.65 · | 2016-11-21 18:52:43

Machines learn to learn by gradient descent by gradient descent, not on YC.

kragen | karma 31428 | avg karma 2.09 · | 2022-03-30 16:55:25

People mostly use gradient descent to "solve" nonconvex problems.

logicchains | karma 9077 | avg karma 2.62 · | 2020-05-30 02:28:36

That's something different: it allows doing gradient descent on the space of paramaters for programs, not on the space of programs.

dnautics | karma 17907 | avg karma 2.14 · | 2021-01-09 17:29:27+00:00

This is wrong. Gradient descent derives from multivariate calculus, not evolution.