logoalt Hacker News

fewtoday at 11:11 AM3 repliesview on HN

I felt like one or two decades ago, all the rage was about rewriting programs into just two primitives: map and reduce.

For example filter can be expressed as:

  is_even = lambda x: x % 2 == 0
  mapped = map(lambda x: [x] if is_even(x) else [], data)
  filtered = reduce(lambda x, y: x + y, mapped, [])
But then the world moved on from it because it was too rigid

Replies

mrlongrootstoday at 2:44 PM

MapReduce is nice but it doesn't, by itself, help you reason about pushdowns for one. Parquet, for example, can pushdown select/project/filter, and that's lost if you have MapReduce. And a reduce is just a shuffle + map, not very different from a distributed join. MapReduce as an escape hatch over what is fundamentally still relational algebra may be a good intuition.

mememememememotoday at 12:08 PM

Performance aside it seems you could do most maybe a the ops with those three. I say three because your sneaky plus is a union operation. So map, reduce and union.

But you are also allowing arbitrary code expressions. So it is less lego-like.

bjournetoday at 3:36 PM

Reductions are painful because they specify a sequence of ordered operations. Runtime is O(N), where N is the sequence length, regardless of amount of hardware. So you want to work at a higher level where you can exploit commutativity and independence of some (or even most) operations.

show 1 reply