Why does a least squares fit appear to have a bias when applied to simple data?

296 points • by azeemba • last Sunday at 8:25 PM • 73 comments • view on HN

Comments

Linear Regression a.k.a. Ordinary Least Squares assumes only Y has noise, and X is correct.

Your "visual inspection" assumes both X and Y have noise. That's called Total Least Squares.

theophrastus • last Sunday at 11:11 PM

Had a QuantSci Prof who was fond of asking "Who can name a data collection scenario where the x data has no error?" and then taught Deming regression as a generally preferred analysis [1]

[1] https://en.wikipedia.org/wiki/Deming_regression

➕ show 1 reply

dllu • last Sunday at 9:07 PM

You can think of it as: linear regression models only noise in y and not x, whereas ellipse/eigenvector of the PCA models noise in both x and y.

➕ show 1 reply

bgbntty2 • yesterday at 3:15 AM

I haven't dealt with statistics for a while, but what I don't get is why squares specifically? Why not power of 1, or 3, or 4, or anything else? I've seen squares come up a lot in statistics. One explanation that I didn't really like is that it's easier to work with because you don't have to use abs() since everything is positive. OK, but why not another even power like 4? Different powers should give you different results. Which seems like a big deal because statistics is used to explain important things and to guide our life wrt those important things. What makes squares the best? I can't recall other times I've seen squares used, as my memories of my statistics training is quite blurry now, but they seem to pop up here and there in statistics relatively often, it seems.

➕ show 13 replies

em500 • last Sunday at 11:19 PM

Sorry for my negativity / meta comment on this thread. From what I can tell the stackexchange discussion in the submission already to provides all the relevant points to be discussed about this.

While the asymmetry of least squares will probably be a bit of a novelty/surprise to some, pretty much anything posted here is more or less a copy of one of the comments on stackexchange.

[Challenge: provide a genuinely novel on-topic take on the subject.]

➕ show 2 replies

sega_sai • last Sunday at 9:29 PM

The least squares and pca minimize different loss functions. One is sum of squares of vertical(y) distances, another is is sum of closest distances to the line. That introduces the differences.

➕ show 2 replies

charlieyu1 • last Sunday at 9:02 PM

If you plot the regression line of y against x, and also x against y, you would get two different lines.

I found it in the middle of teaching a stats class, and feel embarrassed.

I guess normalising is one way to remove the bias.

paulfharrison • last Sunday at 11:22 PM

A note mostly about terminology:

The least squares model will produce unbiassed predictions of y given x, i.e. predictions for which the average error is zero. This is the usual technical definition of unbiassed in statistics, but may not correspond to common usage.

Whether x is a noisy measurement or not is sort of irrelevant to this -- you make the prediction with the information you have.

efavdb • last Sunday at 11:22 PM

Many times I've looked at the output of a regression model, seen this effect, and then thought my model must be very bad. But then remember the points made elsewhere in thread.

One way to visually check that the fit line has the right slope is to (1) pick some x value, and then (2) ensure that the noise on top of the fit is roughly balanced on either side. I.e., that the result does look like y = prediction(x) + epsilon, with epsilon some symmetric noise.

One other point is that if you try to simulate some data as, say

y = 1.5 * x + random noise

then do a least squares fit, you will recover the 1.5 slope, and still it may look visually off to you.

➕ show 1 reply

noncovalence • yesterday at 1:12 AM

This problem is usually known as regression dilution, discussed here: https://en.wikipedia.org/wiki/Regression_dilution

➕ show 1 reply

Ericson2314 • yesterday at 3:37 AM

Yes people want to mentally rotate, but that's not correct. This is not a "geometric" coordinate system independent operation.

IMO this is a basic risk to graphs. It is great to use imagery to engage the spatial reasoning parts of our brain. But sometimes, it is deceiving — like this case —because we impute geometric structure which isn't true about the mathematical construct being visualized.

gpcz • last Sunday at 10:40 PM

You would probably get what you want with a Deming regression.

taylorius • yesterday at 6:11 AM

I think the linear least squares is like a shear, whereas the eigenvector is a rotation.

➕ show 2 replies

a3w • last Sunday at 11:05 PM

My head canon:

If the true value is medium high, any random measurements that lie even further above are easily explained, as that is a low ratio of divergence. If the true value is medium high, any random measurements that lie below by a lot are harder to explain, since their (relative, i.e.) ratio of divergence is high.

Therefore, the further you go right in the graph, the more a slightly lower guess is a good fit, even if many values then lie above it.

kleiba • yesterday at 7:03 AM

> So, instead, I then diagonalized the covariance matrix to obtain the eigenvector that gives the direction of maximum variance.

...as one does...

➕ show 1 reply

djaouen • yesterday at 3:17 AM

This is probably obvious, but there is another form of regression that uses Mean Absolute Error rather than Squared Error as this approach is less prone to outliers. The Math isn’t as elegant, tho.

anthk • yesterday at 10:25 AM

From T3X, intro to Statistics:

https://t3x.org/klong-stat/toc.html

Klong language to play with:

https://t3x.org/klong/

ModernMech • yesterday at 3:50 AM

This is why my favorite best fit algorithm is RANSAC.

bschmidt25002 • last Sunday at 9:02 PM

[dead]

bschmidt25013 • last Sunday at 9:24 PM

[dead]

alt Hacker News

Why does a least squares fit appear to have a bias when applied to simple data?

Comments