Consider iterative code to sum a collection of ints: sum = 0 for value in collectio...

ColinWright • 12/09/2024 • 2 replies • view on HN

Consider iterative code to sum a collection of ints:

  sum = 0
  for value in collection:
    sum += value
  return sum

For every non-empty collection this returns the correct result, and for the empty collection it returns 0.

Now the product:

  product = 1
  for value in collection:
    product *= value
  return product

For every non-empty collection this returns the correct result, and for the empty collection it returns 1.

Now the AND:

  A = True
  for value in collection:
    A = A AND value
  return A

For every non-empty collection this returns the correct result, and for the empty collection it returns True.

Now the OR:

  R = False
  for value in collection:
    R = R OR value
  return R

For every non-empty collection this returns the correct result, and for the empty collection it returns False.

Let's abstract it:

  Def FOLDR( initial, OP, collection )

    result = initial
    for value in collection:
      result = result OP value
    return result

So now:

  sum(     collection ) = FOLDR(   0  ,  + , collection )
  product( collection ) = FOLDR(   1  ,  * , collection )
  and(     collection ) = FOLDR( True , AND, collection )
  or(      collection ) = FOLDR( False, OR , collection )

This is why we define the results we do on empty collections. It's not just a convenience or a convention, it's consistent, and to do otherwise, even if documented, is to lay a trap for future maintainers.

Replies

gpderetta • 12/09/2024

Exactly. More generally the natural initial value for a fold of an operation is the identity (or zero) element for that operation.

MrMcCall • 12/09/2024

But you have specifically initialized your AND and OR results to be True and then False, respectively, thus specifying the resulting value for their processing of the empty set.

What I'm saying is that you always need to specify that default value to handle the empty set properly. In no way would I consider ANDing or ORing an empty set's boolean values to be automatically True or False, (no pun intended). You have chosen to specify them, and in real world programming, not having any elements of that specific set's specific kinds of values could well mean that the default results could be any combination of False and True, (NPI, again).

And, yes, I understand that you must initialize the temporary processing value (that you then return) to True and False in order to properly AND and OR the set's values, but that is different from the semantics of the set's cardinality.

I programmed professionally in C# (with the help of F# for its fsi.exe command-line utility) for a number of years, so I am well aware of how fold et al work. They were a very useful aspect to functional programming, making a lot of processing tasks very straightforward, as you have.

To apply my thinking to your FOLDR function, I would add a parameter that specifies the value to return for the empty set, because I would want to specify its semantics for that specific set such that they do not depend upon the value needed for computation to define it.

  Def FOLDR( emptysetval, initial, OP, collection )

    if length( collection ) == 0 then
      return emptysetval
    result = initial
    for value in collection:
      result = result OP value
    return result

In a similar vein, I also used to specify my db wrapper functions to add special error conditions for specific cases. Let's say you're using a select statement that is only going to return 0 or 1 rows, my select wrapper would have a parameter that would say its valid result cardinality is specifically 0 or 1 and nothing else. Yes, the select statement would succeed, but the situation in the table might not be semantically correct, and it's better IMO to catch the problem when it is issued. It also standardizes the handling of such error conditions by the caller of the wrapper.

The same occurs with a "select count(*) ..."; it must return a single row, or it is an error in semantics if not for the db engine. It can also be a problem if your update statement affects more than one row. And there are other situations where the cardinality must be "> 0" or ">= 0". All these cases were my own error conditions that were not SQL errors, but merely semantic errors caused by db data problems.

I used these this style of manual ORM from perl to VB to C# and F# for 15+ years, to great success.

SEPARATELY

In a db/stats context, the empty set should count as a NULL value, and I don't like to AND or OR actual boolean values with NULL values. Sure, the semantics are defined but I find it's better to catch the NULL value's presence before it gets to being involved in operations.

That's why I always specified NOT NULL in my column defs, because all hell breaks loose once a NULL gets put into a column's values.

Statistics also has such difficulties, as I was many, many years ago helping grad students with their SAS and SPSS data sets and processing. It's always just better to get rid of NULLs, unless the stats you need use are built to handle them. Once again, properly producing the required semantics are the end goal.

➕ show 2 replies

alt Hacker News

Replies