The speed comparison is weird.
The author sets the solver to saga, doesn’t standardize the features, and uses a very high max_iter.
Logistic Regression takes longer to converge when features are not standardized.
Also, the zstd classifier time complexity scales linearly with the number of classes, logistic regression doesn’t. You have 20 (it’s in the name of the dataset), so why only use 4.
It’s a cool exploration of zstd. But please give the baseline some love. Not everything has to be better than something to be interesting.