logoalt Hacker News

leprechaun1066today at 12:07 AM1 replyview on HN

The aj function at its heart is a bin (https://code.kx.com/q/ref/bin/) search between the two tables, on the requested columns, to find the indices of the right table to zip onto the left table.

  aj[`sym`time;t;q]
becomes

  t,'(`sym`time _q)(`sym`time#q)bin`sym`time#t
The rest of the aj function internals are there to handle edge cases, handling missing columns and options for filling nulls.

A lot of the joins can be distilled to the core operators/functions in a similar manner. For example the plus-join is

  x+0i^y(cols key y)#x

Replies

chrisaycocktoday at 12:58 AM

Indeed, my very first attempt used numpy.searchsorted:

https://numpy.org/doc/2.2/reference/generated/numpy.searchso...

I couldn't figure-out how Arthur's bin matched on symbol though, so I switched to a linear scan on the right table to record the last-seen index for each "by" element. While it worked, my hash table was messy because I relied on Python to handle a whole tuple as a key, which had some issues during initial testing.

The asof join I wrote for Empirical properly categorizes the keys before they are matched. That approach worked far better.

https://www.empirical-soft.com/tutorial.html#dataframes