Great article.
Wish list of topics to add:
- branch predictors that can detect patterns (edit: I guess it's already covered in the paragraph about raising prediction accuracy)
- LRU-approximations in L1 caches
- Data prefetching (sequential, stride)
- Return address stack
Concerning μops, I think the 68060 did that, too.