Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> If you want to introduce continuous distributions like the Gaussian one, you can just say "area under the curve" if you need to connect the density to a numerical probability.

What name do you give to this "area under the curve", or the "rate of change" of this area? They are pretty fundamental concepts with important and basic properties, which affect things like local optima and minimization, and expected value and covariance, etc. I mean, you can't cover linear models and least squares without this stuff, and if you don't then I wouldn't really call it learning.



sort by: page size:

How does that work? It doesn't sound like a well-defined distribution since the area under the curve needs to be 1.

> we thinking of cutting slice through a multivariate distribution, and then normalizing. If you have some distribution P(X, Y) where X and Y are both real, then its density is like some landscape (i.e. each x,y is associated with some height). P(X | Y=y) is like a slice through that landscape at Y=y, which leaves a 1-D plot of the density along X. If we re-scale it so that the area under it is 1, then it's a well-formed PDF.

How intuitive!

/s


> I watched some of the youtube, and in the thought experiment she proposes she take a continuous trait(height) and arbitrarily splits it into two buckets. And talks about how this is kinda silly.

Something can be continuous but still very clustered, and the extent to which it's "silly" depends on the uniformity of the distribution. It can still be useful to label the clusters in the distribution.

Another reason to subdivide a continuous dimension is that there could be a threshold after which some other dependent variable begins its inflection point (think of a hockeystick distribution). For example, there's probably some blood pressure value beyond which we begin to rapidly see serious adverse health effects--it's useful to call this "hypertension" or something. For another example, there's a threshold for the average temperature of the planet beyond which global warming "runs away". These are useful thresholds even though the dimensions are continuous.


> by determining the distribution of the process that the data was generated from.

Well, each random variable has a distribution. And there are a few distributions that are common so are taught. Then, presto, bingo, too many students conclude that an important first step is to find a distribution. However, commonly in practice, with just samples and without more in mathematical assumptions, finding a distribution is from not very promising to hopeless. Hopeless? Yes, consider a random variable that takes values in 50 dimensional Euclidean space.

But there is a lot of statistics that is distribution-free, where we make no assumptions on probability distributions. E.g., I published such a paper in Information Sciences. In addition with some meager assumptions, say, the random variables have expectations, the squares of the random variables have finite expectations, etc., can do more.

For model fitting, if can assume that the data has Gaussian distribution, is homoscedastic, have independent and identically distributed (i.i.d.), etc., then can get some more results, e.g., know that some of the results of the computations have Gaussian or F distribution, etc. Then can do a lot of classic hypothesis tests, confidence intervals, etc.

But, with just meager assumptions, commonly can still proceed and know that are still making a best L^2 approximation. Then can drag out the classic result that a sequence of (such) random variables that are Cauchy convergent in L^2 do converge in L^2, that L^2 is complete (i.e., a Hilbert space), and that some sub-sequence converges almost surely. That's a lot -- might be able to take that to the bank. And made no more than meager, general assumptions about distributions.

Really, often we get some of the well known distributions from some theorems, not analysis of empirical data. E.g., get a Gaussian assumption from the central limit theorem. Can get an exponential distribution from the Poisson process (e.g., E. Cinlar's text), and can get that from the very general, even astounding, renewal theorem (e.g., W. Feller's second volume).


It's related to the geometric problems the parent described because probability distributions roughly describe geometric regions (of high probability density) where observations are likely.

The value of a continuous probability density distribution at a specific point is pretty meaningless though; You have to talk about the integral between two values and that won't go above one.

It makes no sense in that it has no physical meaning. As you say, even without deeper meaning it is a fine tool for handling Gaussian distributions.

Okay, I'm stumped. Isn't the gaussian function a probability density function, which means it should have an area of 1 by definition? Are you taking it as f(x) = exp(-x^2)? To keep f(0) equal to 1?

Thanks, I'll take a look! Seems like it's mainly about other distributions in the Gaussian domain of attraction, though, like Poisson and binomial.

I think the idea is to move away from worshiping summary statistics such as mean or median and starting to look at differences of distributions.

>everyone uses different variables and judgment to make that prediction. Those variables and those predictions don't show up when your model is based on distributions.

Sorry but that's simply not true. everyone uses different variables and judgment to make a phone call, yet phone calls follow an exponential distribution! everyone uses different variables and judgment to hit the internet, yet network traffic follows a Lavalette distribution. everyone uses different variables and judgment to buy stocks, yet equity prices follow a lognormal distribution.

Individual behavior in aggregate will almost always follow some distribution, regardless of each individual using different variables and judgment for himself. The whole point of statistics is to uncover the underlying probability distribution given tons of (seemingly random) data. Math does the opposite ( ie. given the distribution, a mathematician can tell you how to derive nice things like the moment generating function & the first & second moments & density functions & related family of distributions & so on, & in general give you n sample variates that fit the distribution. By saying "everyone uses different variables and judgment" you are in essence saying its just too complicated, but even if that were true, that is just another distribution ( white noise )


Thanks. Although the author is not an expert and he kind of lost me when he used normal distribution to model the curve, it still looks like the right ballpark. That's a lot.

I like seeing the thought process by this person trying to explore different strategies based on their existing knowledge. There appears to be a critical point in the article where the author makes a plot of the noise outputs and says it resembles a "bell curve", but they don't attack that as the root problem.

I can immediately see myself 10 years ago following this same thought process of how can I mitigate-this-problem, instead of dismantle-this-problem... I continually read HN and keep trying to learn new ideas so that I have ammunition for future problems.

Now a'days I would have done something similar (taken a histogram), and seen that it appeared Gaussian. I may have also done a test for normality (probably Anderson-Darling). Seeing that it is normal-ish, I would have looked up how to convert a Gaussian distribution to a Uniform distribution using the Cumulative Distribution Function.

Moral of the story - make sure to keep exploring while you're exploiting, future you will thank you.

Edit for readability.


Moving from point estimates to distributions is great progress.

It reminds me of this great paper that highlights how much information we're losing when we're only looking at means or assume everything is normally distributed. https://arxiv.org/pdf/1806.02404.pdf


Yep. That's why one speaks of probability density functions and not just probability functions.

Yes, sort of. But I think he says a lot of unnecessary things not getting at the root of the issue.

I left out some detail I should have said, like what is so special about a gaussian that makes the math easy. So I will say it.

A measurement can infer a probability distribution for what the measured quantity is. A second measurement, on its own, also infers some probability distribution for what the measured quantity is. It we consider both measurements together, we get yet another probability distribution for what the measured quantity is. The magic is that if we had a gaussian distribution for the measurements, then the distribution for the combined measurements is also a gaussian. This is not true in general. As long as we have gaussian distributions we can do all the operations we want and the probability distributions are gaussian and can be fully described by a center point and a width. (Forgive me for the liberties I am taking here.) The basic alternative to exactly solving the problem is to actually try to carry around the probability distribution functions, which is not practical even with very powerful computers.


So, to me, this is just calibrating against a continuum. Right? I think of this as binary-searching a kinda normal'ish distribution? I'm not good at math so I had a hard time once the OP article got into differential equations

I agree. I am of course nitpicking, but it is exactly the same mistake I made in the first post (but I was assuming a discrete distribution instead of a continuous distribution).

Maybe I'm missing something? To me, that gif illustrates sets of samples that are on the surface compatible with the the same Gaussian distribution, even if a Gaussian distribution is "obviously" lacking.
next

Legal | privacy