logoalt Hacker News

sparkieyesterday at 8:10 PM1 replyview on HN

It's true though. Time complexity is a relative measure. Big-O notation is an upper bound, not a lower one. You don't take the fastest storage (L1 cache) as the baseline `x` and then say it takes some f(n) > x time to access storage because it's not in the fastest cache. This is a complete misrepresentation of Big-O notation.

When we say something is O(1), we're saying that there is a constant upper bound on time to compute the algorithm. It is constant time if it takes no longer to to perform the algorithm when `n` becomes large. O(1) simply means the worst case time to compute is independent of `n`. If the algorithm is accessing storages of varying latencies, then the baseline is the slowest medium.

For small `n` (ie, what might fit into cache), Big-O notation is not really that useful. When `n` is tiny a O(n) algorithm can outperform a O(1) algorithm. The notation doesn't tell us anything about how efficient a particular implementation is. It is mainly useful when we're talking about `n` which can grow to large sizes, where whatever micro-optimizations we, or the machine might perform are dominated by `n`.


Replies

tsimionescuyesterday at 8:48 PM

This is quite wrong. If an algorithm requires one memory read, and reading one byte from memory takes (size of data)^1/3 time steps, then the complexity of that algorithm is O(N^1/3), not O(1).

This is well known for doing large number arithmetic. For example, we typically consider that a+a is an operation that takes O(1) time to execute. This is because we know we are limiting the analysis to small numbers, even if we have a lot of them. But if the numbers can be arbitrarily large, then a+a doesn't take O(1) time, it takes O(log a) time. So, an algorithm that needs to perform O(N) additions doesn't take O(N) time, it takes O(N log K) time. And if we are doing something like adding the first N numbers, the complexity is actually O(N log N), not O(N).

The same thing happens to memory: if we assume that the problem will fit into very fast memory even for the largest sizes, we can indeed approximate memory access as O(1). But if we admit that the more memory we need to store the components of the problem, the slower memory access will get, then we need to model the cost of accessing memory as a function of program size - and the author is proposing O(N^1/3) as that cost. This means that our algorithm for adding the first N numbers actually requires O(N^4/3) time for reading N numbers from memory and then O(N log N) additions, which gives it a total complexity of O(N^4/3 + N log N) = O(N^4/3).

Now, this type of analysis is not useful if we think all our numbers will fit into a single computer's memory, probably. But if we are trying to, say, figure out how our algorithm will scale from gigabytes of data to hundreds of petabytes of data, we need to model the rising cost of memory access as our data set gets bigger too. At some point as N grows to infinity, even just computing the address of the next byte becomes a complex operation that can take longer than the whole time it took to add the first 100GB of numbers.

show 1 reply