A write-up of a new Gregorian date conversion algorithm.
It achieves a 30–40% speed improvement on x86-64 and ARM64 (Apple M4 Pro) by reversing the direction of the year count and reducing the operation count (4 multiplications instead of the usual 7+).
Paper-style explanation, benchmarks on multiple architectures, and full open-source C++ implementation.
Interesting how it compares with the ClickHouse implementation, which uses a lookup table: https://github.com/ClickHouse/ClickHouse/blob/master/src/Com...
So that a day number can be directly mapped to year, month, and day, and the calendar date can be mapped back with a year-month LUT.
Nice to see the micro-optimising folks are still making progress on really foundational pieces of the programming stack
It took me a while to understand that internally it uses 128bit numbers, that `>> 64` in the pseudocode was super confusing until I saw the C++ code.
Neat code though!
Good opportunity to plug this folklore legend: https://neosmart.net/forums/threads/an-extended-history-of-t...
Nice to see that there are still some jewels left to be dug out from the algorithm land.
Perhaps nicer to avoid the comment and write:
const C1 = 505054698555331 // floor(2^64*4/146097)
as constexpr int C1 = floor(2^64*4/146097);For 64 bit timekeeping arguably for lots of uses counting nanoseconds makes a more sense than seconds. You can still cover decent usable range (2^64 ns > 584 years) and save the need for separate subsecond counter.
What would be the most efficient algorithm to handle such ns scale? I guess one option would be just to divide by 10^9 and run the code from the article, but can we do better?
An interesting writeup on using a different representation for time is here[1]. It can represent any specific second from March 1, 2000 +/-2.9Myears with 62 bits and can efficiently calculate Gregorian dates using only 32-bit arithmetic. An optimization involving a 156K lookup table is also discussed.
A few notes for those not familiar with Lisp:
1. Common Lisp defines a time called "universal time" that is similar to unix time, just with a different epoch
2. A "fixnum" is a signed-integer that is slightly (1-3 bits) smaller than the machine word size (32-bits at the time the article was written). The missing bits are used for run-time type tagging. Erik's math assumes 31-bits for a fixnum (2.9M years is approximately 2^30 days and fixnums are signed).
3. Anywhere he talks about "vectors of type (UNSIGNED-BYTE X)" this means a vector of x-bit unsigned values. Most lisp implementations will allow vectors of unboxed integers for reasonable values of X (e.g. 1, 8, 16, 32, 64), and some will pack bits for arbitrary values of X, doing the shift/masking for you.
Admittedly in a different league speed wise but also scope wise is my very fast timestamp library for Java https://github.com/williame/TimeMillis
This focuses on string <-> timestamp and a few other utilities that are super common in data processing and where the native Java date functions are infamously slow.
I wrote it for some hot paths in some pipelines but was super pleased my employer let me share it. Hope it helps others.
The Windows epoch starts on 1601-01-01. I always assumed that was because it slightly simplifies the calculation, as described in the article. But it's not as good as the article's method of counting backwards.
TIL that Unix Time does not count leap seconds. If it did, it wouldn't have been possible to write routines that are this fast.
Thank you for sharing. This is a great achievement not only in the ability to invent a novel algorithm with significant performance gains but also the presentation of the work. It's very thorough and detailed, and I appreciated reading it.
For something this short that is pure math, why not just hand write asm for the most popular platforms? Prevents compiler from deoptimizing in the future.
Have a fallback with this algorithm for all other platforms.
> The algorithm provides accurate results over a period of ±1.89 Trillion years
i'm placing my bets that in a few thousand years we'll have changed calendar system entirely haha
but, really interesting to see the insane methods used to achieve this
That pseudo-code isn't very imprecise because there's no type information (64-bit or 128-bit integers? signed or unsigned?) and it doesn't account for results of overflow or underflow in the realm of UB. It's also inconsistent to introduce bit shifts instead of division and then use modulus instead of "and" masking; typically, pick one style or the other.
caldat is the third algorithm in the Numerical Recipes in Pascal (1986,89,90,92) book[0] (p. 13), where Julian days are easy to turn into days since the UNIX epoch. It uses 3 single-precision floating point divisions and 3 multiplications with pre-Gregorian support or 2 each respectively without, but is convertible to an algorithm using a mix of 8-bit and 16-bit signed fixed point integer math for microcontroller usage. 64-bit (or higher) integer math is not strictly required, but whatever's faster and correct for a given target is fine.
0: The last time I dug up the book was some time last year because I was hunting for an algorithm for the precise position of the Sun in the sky given a lat lon (WGS 84) and date time for a solar tracker that didn't need light sensors, only time and location that was already available for free.
[flagged]
Can this algorithm tell me how old I was last year?
Do these calculations take into account the lengthening of the days due to tidal friction?
I wrote my own date calculation functions a while ago. And during that, I had an aha moment to treat March 1 as the beginning of the year during internal calculations[0]. I thought it was a stroke of genius. It turns out this article says that’s the traditional way.
[0]: https://github.com/kccqzy/smartcal/blob/9cfddf7e85c2c65aa6de...