logoalt Hacker News

alexshtf04/27/20250 repliesview on HN

I am the author of that post series... I see it did some noise :)

I partially agree with your claims. It indeed is an issue if you take high powers of numbers outside of the [-1, 1] interval, but I completely disagree with your assertion that this is the main issue.

Take the famous LogSumExp function as an example - it's used everywhere in machine learning. Nobody says it's a "big no no", even though a naive implementation blows up because you exponentiate large numbers. And if you want to back-prop through it - well, it's even worse. But there is also a "right" way to use it: LogSumExp(x_1, ..., x_n) = LogSumExp(x_1 - m, ..., x_n - m) + m where m=max(x_1, ..., x_n)

But the point is that you don't even care! We have good implementations in library functions, such as toch.logsumexp, or scipy.special.logsumexp, and we don't need to worry about it. We never have to compute LogSumExp ourselves and get into the numerical issues.

With polynomials it's the same. Take the Legendre basis. Over an arbitrary interval [a, b], you have the formula P_n(x) = (2n+1)(x - (a + b)) P_n(x) / (b - a) + n P_n(x) It's numerically stable. But you don't even care what the formula is - you just call the appropriate function from the np.polynomial.legendre package.

The difference between polynomials and LogSumExp stems from education - people are taught that LogSumExp is "normal" but polynomials are a "big no no". Most people know that there is a library function for LogSumExp, but don't know there are similarly good library functions for the "right" polynomial bases.

And why I believe that the basis is the main issue? Well, it's because even if you use [-1, 1], the standard basis doesn't work with high degrees - it has an extremely badly conditioned vandermonde matrix, and regularization can't help you get a reasonable polynomial. I show it in several posts in the series. The problem with large powers of small numbers is as bad as with large numbers. And the issue with the standard basis is that you have no choice, but to compute these high powers - that's how the basis is defined.

Having said that, I will obviously take your input into account, and say something about high powers in the post. Explain why I believe it's exactly like LogSumExp, and why I believe it's the basis (rather than floating point errors) that is the issue. You are correct that I wrongly assumed it's obivious, and it is not.

Like with any object, oftentimes working with it by definition is very different with how you work with it computationally. But I believe that once you have good library function, you don't really care.