It was a fun moment to finally work on a data problem that did not fit on any (practical) machine. I needed about 50TiB of memory to process a multi-PiB set of logs.
It's worth remembering however that even though it's less efficient per-CPU or whatever to split a large task into many smaller tasks, it may be more efficient overall alongside other workloads as you can bin-pack tasks more efficiently on a cluster, not to mention if tasks fail you are retrying less of the overall work.
All this is to say, the article makes a very good point, but doing it all on one machine also has problems. Just don't cargo cult engineering decisions.