I completely disagree with the conclusion of the article. The reason the examples worked so well is ...

constantcrying • 04/22/2025 • 4 replies • view on HN

I completely disagree with the conclusion of the article. The reason the examples worked so well is because of an arbitrary choice, which went completely uncommented.

The interval was chosen as 0 to 1. This single fact was what made this feasible. Had the interval been chosen as 0 to 10. A degree 100 polynomial would have to calculate 10^100, this would have lead to drastic numerical errors.

The article totally fails to give any of the totally legitimate and very important reason why high degree polynomials are dangerous. It is absurd to say that well known numerical problems do not exist because you just found one example where they did not occur.

Replies

ForceBru • 04/22/2025

The article specifically points out that these polynomials only work well on specific intervals (emphasis copied from the article):

"The second source of their bad reputation is misunderstanding of Weierstrass’ approximation theorem. It’s usually cited as “polynomials can approximate arbitrary continuous functions”. But that’s not entrely true. They can approximate arbitrary continuous functions in an interval. This means that when using polynomial features, the data must be normalized to lie in an interval. It can be done using min-max scaling, computing empirical quantiles, or passing the feature through a sigmoid. But we should avoid the use of polynomials on raw un-normalized features."

As I understand it, one of the main ideas of this series of posts is that normalizing features to very specific intervals is important when fitting polynomials. I don't think this "went completely uncommented".

➕ show 2 replies

nobodywillobsrv • 04/22/2025

Yes and it fails to talk about boundary conditions or predicates or whatever.

alexshtf • 04/27/2025

I am the author of that post series... I see it did some noise :)

I partially agree with your claims. It indeed is an issue if you take high powers of numbers outside of the [-1, 1] interval, but I completely disagree with your assertion that this is the main issue.

Take the famous LogSumExp function as an example - it's used everywhere in machine learning. Nobody says it's a "big no no", even though a naive implementation blows up because you exponentiate large numbers. And if you want to back-prop through it - well, it's even worse. But there is also a "right" way to use it: LogSumExp(x_1, ..., x_n) = LogSumExp(x_1 - m, ..., x_n - m) + m where m=max(x_1, ..., x_n)

But the point is that you don't even care! We have good implementations in library functions, such as toch.logsumexp, or scipy.special.logsumexp, and we don't need to worry about it. We never have to compute LogSumExp ourselves and get into the numerical issues.

With polynomials it's the same. Take the Legendre basis. Over an arbitrary interval [a, b], you have the formula P_n(x) = (2n+1)(x - (a + b)) P_n(x) / (b - a) + n P_n(x) It's numerically stable. But you don't even care what the formula is - you just call the appropriate function from the np.polynomial.legendre package.

The difference between polynomials and LogSumExp stems from education - people are taught that LogSumExp is "normal" but polynomials are a "big no no". Most people know that there is a library function for LogSumExp, but don't know there are similarly good library functions for the "right" polynomial bases.

And why I believe that the basis is the main issue? Well, it's because even if you use [-1, 1], the standard basis doesn't work with high degrees - it has an extremely badly conditioned vandermonde matrix, and regularization can't help you get a reasonable polynomial. I show it in several posts in the series. The problem with large powers of small numbers is as bad as with large numbers. And the issue with the standard basis is that you have no choice, but to compute these high powers - that's how the basis is defined.

Having said that, I will obviously take your input into account, and say something about high powers in the post. Explain why I believe it's exactly like LogSumExp, and why I believe it's the basis (rather than floating point errors) that is the issue. You are correct that I wrongly assumed it's obivious, and it is not.

Like with any object, oftentimes working with it by definition is very different with how you work with it computationally. But I believe that once you have good library function, you don't really care.

alt Hacker News

Replies