The least squares and pca minimize different loss functions. One is sum of squares of vertical(y) distances, another is is sum of closest distances to the line. That introduces the differences.
That makes sense. Why does least squares skew the line downwards though (Vs some other direction)? Seems arbitrary
I find it helpful to view least as fitting the noise to a Gaussian distribution.
"...sum of squared distances to the line" would be a better description. But it also depends entirely on how covariance is estimated