What the post is describing is just ANOVA. If removing a category improves the overall fit then fitting the two terms independently has the same optimal solution (with the two independent terms found to be identical). MSE never increases when adding a category.
This is why you have to reach to things that penalize adding parameters to models when running model comparisons.
No, the post is doing cross-validation to test predictive power directly. The error will not decompose as neatly then.